Using the parser in other open source projects

Post by **sreinst1** » Tue Sep 23, 2008 11:31 am

Hi,

First, thanks for the great CodeLite IDE. I read that CodeLite started as a library project to provide code completion for other projects, based on Ctags, SQLite and a small C++ parser for details that Ctags does not provide.
My favorite editor is jEdit, and I've created several plugins for jEdit based on the open source tagging tools Ctags and GNU Global (similar to CScope). Like CodeLite, my plugins store the information from Ctags in a database. (In my case, the plugins access the database using JDBC, and the user can use whichever JDBC database engine that he/she likes, the database parameters are configurable.)
Now I want to add context-sensitive features to jEdit, such as code-completion, "go to definition", "browse references", static call tree, etc. For this, I need a parser like the one that CodeLite has. Hence, I have some questions:
1. Is it okay for me to take the parser (and lexer) of CodeLite, and port them to ANTLR (a parser generator in Java)?
2. Can you give me a short explanation of how and when this parser is used? i.e. which source code sections are parsed by this parser, and when?

If it's okay for me to port the parser to ANTLR, and I will find this suitable for my purposes, there are a few features that we can later share. For example, ANTLR is a Java tool, but it can generate the parser for the grammar in multiple target language, so it can also generate a parser in C++. I want to use ANTLR because it's expected to be very efficient, and also it now has an "IDE" built around it, AntlrWorks, which provides a very user friendly way for editing the grammar and debugging it.
Also, I intend to combine several such tools together in order to provide features such as static call tree. A static call tree is based on two separate types of information provided by these tools: 1. Symbol references - direct references to a symbol are provided by Cscope or GNU Global. 2. Function definitions - provided by Ctags. To generate a call tree, I find the references to the function for which the tree is generated, then I find the functions containing these references, and so on. The CodeLite parser will be useful here for determining the context of the function for which the call tree is requested.

A while ago I thought to write a library for such features, perhaps in C++, that will be used as a standalone process by other tools (like Ctags is used now). This will remove the dependency on the language of the user-tool. But eventually, I created my jEdit plugins in Java and avoided this extra indirection. If found suitable, maybe I can go back to my original thought.

Thanks,
Shlomy

Post by **eranif** » Tue Sep 23, 2008 12:31 pm

Hi Shlomi (שלום?)

sreinst1 wrote:First, thanks for the great CodeLite IDE

NP

sreinst1 wrote:I read that CodeLite started as a library project to provide code completion for other projects, based on Ctags, SQLite and a small C++ parser for details that Ctags does not provide.

Indeed, you got it right

sreinst1 wrote:1. Is it okay for me to take the parser (and lexer) of CodeLite, and port them to ANTLR (a parser generator in Java)?

As long as you comply to the GPL license.

sreinst1 wrote:2. Can you give me a short explanation of how and when this parser is used? i.e. which source code sections are parsed by this parser, and when?

Ok, in short:
CodeLite uses ctags to create a lookup table. The information is then stored in sqlite database for future references.
Now lets, say that you have such expression:

Code: Select all

wxString str;
str.BeforeFirst(wxT(',')).Trim().Trim(false).

Now, ctags has no knowledge of grammar syntax, this is where the yacc parser kicks in - it is called by CodeLite to determine that should be displayed after the last '.'

In general there are 4 parser which I have created, each to solve small task.
The first parser 'scope parser': which determines the current scope:

Code: Select all

namespace foo {
void Class::bar() {
     | //<--- caret is here,

Running the scope parser, will return 'foo::Class'

The second parser is the variable parser:
it parses text and searches for local variables in that scope. The output of this parser is std::list<Variable>
each variable class holds information such as: isTemplate, name, type, typeScope, pattern (the text where it was found) and more

this helps codelite to provide code completion for local variables, in the above code (wxString str; ...)
it will provide Variable for str, and set the type of str to be wxString, and the type scope is <global>.

The third parser is the expression parser, its task is to resolve expressions, for example:

Code: Select all

str.BeforeFirst(wxT(',')).Trim().Trim(false).

CodeLite breaks this expression, into 'tokens' based on language operators (. -> ::) and passes them to the expression. It is mainly used to resolve casting, 'this' keyword and some other issues of C++.

So the logic executed by Codelite to resolve the above expression, is :

Code: Select all

break expression to tokens
while more tokens
     parse_expression( token )
     if ok
          resolve name to type 
          if ok
             continue
end while

The grammar files are located at: /path/to/codelite/sources/trunk/CxxParser/
there is a workspace there, open it with CodeLite

This workspace generates the files needed for CodeLite IDE workspace

In codelite, whenever user hit Ctr-Space, the method 'WordCompletionCandidates' kicks in, I suggest you follow it ctags_manager.cpp
When ever user type -> . or :: the method 'AutoCompleteCandidates' kicks in

For more information you can try and ask at IRC channel:
channel name: #codelite
server: irc.freenode.net

sreinst1 wrote:If it's okay for me to port the parser to ANTLR, and I will find this suitable for my purposes, there are a few features that we can later share. For example, ANTLR is a Java tool, but it can generate the parser for the grammar in multiple target language, so it can also generate a parser in C++. I want to use ANTLR because it's expected to be very efficient, and also it now has an "IDE" built around it, AntlrWorks, which provides a very user friendly way for editing the grammar and debugging it.

Like I said, its OK for you to port it to ANTLR (I know this tool

) I wrote my parser in YACC and I think ANTLR can import YACC files.

sreinst1 wrote:1. Symbol references - direct references to a symbol are provided by Cscope or GNU Global

cscope is a poor tool for C++. I added a plugin for cscope only because I had a FR and it is useful for C. Have a look of how I implemented refactoring in the code - and you see that Codelite can be very accurate

sreinst1 wrote:A while ago I thought to write a library for such features, perhaps in C++, that will be used as a standalone process by other tools (like Ctags is used now). This will remove the dependency on the language of the user-tool. But eventually, I created my jEdit plugins in Java and avoided this extra indirection. If found suitable, maybe I can go back to my original thought.

This is a good idea, I too wanted to create the CodeCompletion as library (it still is an independent module of the IDE it has no build dependencies with other modules of the IDE - but the problem is that I fell in love with the idea of implementing the entire IDE to demonstrate it ...

Good luck,
Eran

Post by **sreinst1** » Tue Sep 23, 2008 12:57 pm

Hi,

שלום is good

First, thanks a lot for the detailed reply. I actually found out most of this information from the CodeLite source code, but there are some things missing:
1. Ctags does not tell you the types of variables at all. So, if you have something like:
void C::f() {
a. <-- caret is here
}
You need to find the type of C::a, which requires parsing the declaration of class C (or at least a part of it). Are the parsers invoked only when the user requests one of the actions that require them (e.g. completion, refactoring)? Or do you, for this reason, parse all files when they are added to the project?
2. When you get information from the parser, do you add it to the database and keep it up-to-date? Or otherwise, the parser will have to be invoked again for the same code the next time the user performs a context-sensitive action.
3. I had an idea to simplify things: Assuming any code in C++ is inside a function or a macro, I can rely on the output of Ctags for the file containing the code with the current expression, and find the function / macro that's defined in the highest line number that is smaller or equal to the current expression. Is that not good enough?

BTW, since jEdit is a text editor supporting a very wide variety of languages, and since the tools I rely on (Ctags, GNU Global) support several languages as well, I had in mind to design these features to also support a wide variety of languages, and introduce a model (a set of interfaces) that can be implemented for several target languages. But this is starting to look like Eclipse, and one of the things I want to avoid is the slowness, so I may give this up and create something tailored for C/C++, since this is the only language for which I need this in jEdit.

Another BTW - Using Ctags, I created a class browser plugin, and an outline plugin - which shows you the tags in the current buffer. In contrast with the outline feature provided in many other IDEs, I made it completely configurable. You can configure the sorting, filtering and grouping of tags, e.g. you can group all functions under 'Functions', or group all functions of some class under the class name. You can take some ideas from there for CodeLite. Maybe I can change this to some C++ library that you will be able to use as well.

Shlomy

Post by **eranif** » Tue Sep 23, 2008 1:19 pm

sreinst1 wrote:1. Ctags does not tell you the types of variables at all. So, if you have something like:
void C::f() {
a. <-- caret is here
}
You need to find the type of C::a, which requires parsing the declaration of class C (or at least a part of it). Are the parsers invoked only when the user requests one of the actions that require them (e.g. completion, refactoring)? Or do you, for this reason, parse all files when they are added to the project?

CodeLite invoke ctags when:
- User saves file
- Files are added to the project
- Explicit request (Retag file/Workspace/project)
so this data is kept up-to-date in the database.
The yacc parser is invoked every time (and it is extremely fast). Since ctags does not provide the type of the variable nor the return value of a function, I simply parses the 'pattern' provided by ctags for this. Have a look at function:

Code: Select all

bool Language::VariableFromPattern(const wxString &in, const wxString &name, Variable &var)

and:

Code: Select all

bool Language::FunctionFromPattern(const wxString &in, clFunction &foo)

sreinst1 wrote:2. When you get information from the parser, do you add it to the database and keep it up-to-date? Or otherwise, the parser will have to be invoked again for the same code the next time the user performs a context-sensitive action.

No, the parser is invoked again (only the Yacc parser) - like I said above, the ctags is invoked whenever a modified file is being saved.

sreinst1 wrote:3. I had an idea to simplify things: Assuming any code in C++ is inside a function or a macro, I can rely on the output of Ctags for the file containing the code with the current expression, and find the function / macro that's defined in the highest line number that is smaller or equal to the current expression. Is that not good enough?

I am already dong such optimizations

, have a look at:

Code: Select all

TagEntryPtr TagsManager::FirstFunctionOfFile(const wxFileName &fileName)

Code: Select all

TagEntryPtr TagsManager::FirstScopeOfFile(const wxFileName &fileName)

Code: Select all

TagEntryPtr TagsManager::FunctionFromFileLine(const wxFileName &fileName, int lineno)

sreinst1 wrote:BTW, since jEdit is a text editor supporting a very wide variety of languages, and since the tools I rely on (Ctags, GNU Global) support several languages as well, I had in mind to design these features to also support a wide variety of languages, and introduce a model (a set of interfaces) that can be implemented for several target languages. But this is starting to look like Eclipse, and one of the things I want to avoid is the slowness, so I may give this up and create something tailored for C/C++, since this is the only language for which I need this in jEdit.

I decided the CodeLite will remain C/C++ IDE, I dont want to become CodeBlocks/eclipse - I prefer to remain "lite", focus on one thing and it well.

sreinst1 wrote:Another BTW - Using Ctags, I created a class browser plugin, and an outline plugin - which shows you the tags in the current buffer. In contrast with the outline feature provided in many other IDEs, I made it completely configurable. You can configure the sorting, filtering and grouping of tags, e.g. you can group all functions under 'Functions', or group all functions of some class under the class name. You can take some ideas from there for CodeLite. Maybe I can change this to some C++ library that you will be able to use as well.

CodeLite also has 'Outline Feature' - not configurable as you describe it, but it by defautl groups 'functions' and 'prototype', globals, macros and other.

If you are using Codelite, hit Ctrl-Shift-O to see what I mean, or switch to the 'Outline' tab.
In addition I also added other navigation methods: Open resource (Ctrl-Shift-R) or Open type (Ctrl-Shift-T)

Eran

Post by **sreinst1** » Tue Sep 23, 2008 1:56 pm

The yacc parser is invoked every time (and it is extremely fast). Since ctags does not provide the type of the variable nor the return value of a function, I simply parses the 'pattern' provided by ctags for this. Have a look at function:

Code: Select all

bool Language::VariableFromPattern(const wxString &in, const wxString &name, Variable &var)

and:

Code: Select all

bool Language::FunctionFromPattern(const wxString &in, clFunction &foo)

>> Parsing only the pattern provided by Ctags may not be enough, many users use a coding convention where the returned type is specified on the line preceding the function header, and as far as I know the Ctags pattern is only for the header line itself.
BTW, I noticed also that CodeLite does not take into account levels of dereferencing. E.g. if I declare a variable 'a' or type 'A', then the following will generate the same code completion options "a.", "a->" (even though the second should not). I don't know (didn't check) what it does when both can be used and have a different type as context (such as when the class has the operation "->" defined).

Post by **eranif** » Tue Sep 23, 2008 2:23 pm

eranif wrote:>> Parsing only the pattern provided by Ctags may not be enough, many users use a coding convention where the returned type is specified on the line preceding the function header, and as far as I know the Ctags pattern is only for the header line itself.

And you are correct, in this case the code completion will fail, but it will work for most cases. Btw, Codelite uses its own hacked version of ctags because of the same problem u described here: template are often written like this:

Code: Select all

template <class T>
class MyT {
};

To resolve templates, I need the 'template<class T> line, which will be discard by ctags, my version of ctags changes the pattern from:

Code: Select all

template <class T>
class MyT {
};

to

Code: Select all

template <class T> class MyT {};

sreinst1 wrote:BTW, I noticed also that CodeLite does not take into account levels of dereferencing.

Yes, it does not. and this is on purpose for cases where the code completion will fail to identify it correctly. So I prefer to left it in the hands of the user
Btw, this information is available in the Variable class, there is a member name 'm_isPtr'

However, I did improve the CodeCompletion, so in case user types '::' as in:
MyClass:: CodeLite will display anything of type struct, class, union, enums typedefs of scope myClass, while typing:
MyClass cls; cls. codelite will only display methods / members

Eran

Post by **sreinst1** » Tue Sep 23, 2008 8:06 pm

eranif wrote:
sreinst1 wrote:BTW, I noticed also that CodeLite does not take into account levels of dereferencing.
Yes, it does not. and this is on purpose for cases where the code completion will fail to identify it correctly. So I prefer to left it in the hands of the user
Btw, this information is available in the Variable class, there is a member name 'm_isPtr'

However, I did improve the CodeCompletion, so in case user types '::' as in:
MyClass:: CodeLite will display anything of type struct, class, union, enums typedefs of scope myClass, while typing:
MyClass cls; cls. codelite will only display methods / members

Eran

Note that in general, it is not enough to know whether a variable is a pointer or not. In general, you need to save the level of referencing (e.g. C ***p is not the same as C *p). It is especially important when operations such as "->" and "." are overridden by the relevant class. But it's a good enough approximation for most cases. It just depends on how accurate you want it to be. In the case of jEdit, I'm not even sure yet that I will go your way, I might even resolve to using regular expressions completely, avoiding the extra accuracy you get by your grammar. BTW, I tried to use CodeLite a bit, and it feels quite "heavy". Whenever I save a file, it takes a second or two until I get control back, I assume this is due to the invocation of Ctags and its processing, which I suggest to put in the background (if not already).

Post by **eranif** » Tue Sep 23, 2008 8:40 pm

sreinst1 wrote:BTW, I tried to use CodeLite a bit, and it feels quite "heavy"

Almost an instant here, I guess it depends on the machine configuration you have

Which OS?
Revision?
What is your system configuration?
and how large is the size of the file

sreinst1 wrote:which I suggest to put in the background (if not already).

Whenever you save a file, a message is sent to the background thread which in turn "talks" with ctags and passes the file name to be parsed.

The processing of the output is also done via the the secondary thread. The only thing that needs the main thread - is updating the tree "outline" view.

Eran

Post by **sreinst1** » Tue Sep 23, 2008 8:53 pm

eranif wrote: The processing of the output is also done via the the secondary thread. The only thing that needs the main thread - is updating the tree "outline" view.

Eran

Doesn't the update of the outline view depend on the process of Ctags and the database update? They must be completed before you can update the outline...
My OS: WinXP Professional
CPU: 3.0 Ghz, single core
Revision: Latest one available from the CodeLite site
File size: Pretty small (~200 lines)

Post by **eranif** » Tue Sep 23, 2008 9:08 pm

sreinst1 wrote:Doesn't the update of the outline view depend on the process of Ctags and the database update? They must be completed before you can update the outline...

It does.

This is what the worker thread does:
- Invoke ctags for the modified file
- Reads ctags output (ctags is running as daemon process, --filter=yes option of ctags)
- Performs a 'diff' operation between the tags that exists in the database for this file, and the new ones received from ctags.
- update the database with the new tags
- Send 3 events to the outline view:
- SymbolDelete - the event also contains list of tags that should be removed from the outline view
- SymbolUpdate - the event also contains list of entries that their display/icon should be modified
- SymbolNew - new items that should be added to the tree

sreinst1 wrote:My OS: WinXP Professional
CPU: 3.0 Ghz, single core

Probably this is why you see this delay. I will look into the code to see where it can be optimized..

Eran

CodeLite IDE

Using the parser in other open source projects

Using the parser in other open source projects

Re: Using the parser in other open source projects

Re: Using the parser in other open source projects

Re: Using the parser in other open source projects

Re: Using the parser in other open source projects

Re: Using the parser in other open source projects

Re: Using the parser in other open source projects

Re: Using the parser in other open source projects

Re: Using the parser in other open source projects

Re: Using the parser in other open source projects

Who is online