Using the parser in other open source projects
-
- CodeLite Enthusiast
- Posts: 19
- Joined: Tue Sep 23, 2008 11:06 am
- Contact:
Using the parser in other open source projects
Hi,
First, thanks for the great CodeLite IDE. I read that CodeLite started as a library project to provide code completion for other projects, based on Ctags, SQLite and a small C++ parser for details that Ctags does not provide.
My favorite editor is jEdit, and I've created several plugins for jEdit based on the open source tagging tools Ctags and GNU Global (similar to CScope). Like CodeLite, my plugins store the information from Ctags in a database. (In my case, the plugins access the database using JDBC, and the user can use whichever JDBC database engine that he/she likes, the database parameters are configurable.)
Now I want to add context-sensitive features to jEdit, such as code-completion, "go to definition", "browse references", static call tree, etc. For this, I need a parser like the one that CodeLite has. Hence, I have some questions:
1. Is it okay for me to take the parser (and lexer) of CodeLite, and port them to ANTLR (a parser generator in Java)?
2. Can you give me a short explanation of how and when this parser is used? i.e. which source code sections are parsed by this parser, and when?
If it's okay for me to port the parser to ANTLR, and I will find this suitable for my purposes, there are a few features that we can later share. For example, ANTLR is a Java tool, but it can generate the parser for the grammar in multiple target language, so it can also generate a parser in C++. I want to use ANTLR because it's expected to be very efficient, and also it now has an "IDE" built around it, AntlrWorks, which provides a very user friendly way for editing the grammar and debugging it.
Also, I intend to combine several such tools together in order to provide features such as static call tree. A static call tree is based on two separate types of information provided by these tools: 1. Symbol references - direct references to a symbol are provided by Cscope or GNU Global. 2. Function definitions - provided by Ctags. To generate a call tree, I find the references to the function for which the tree is generated, then I find the functions containing these references, and so on. The CodeLite parser will be useful here for determining the context of the function for which the call tree is requested.
A while ago I thought to write a library for such features, perhaps in C++, that will be used as a standalone process by other tools (like Ctags is used now). This will remove the dependency on the language of the user-tool. But eventually, I created my jEdit plugins in Java and avoided this extra indirection. If found suitable, maybe I can go back to my original thought.
Thanks,
Shlomy
First, thanks for the great CodeLite IDE. I read that CodeLite started as a library project to provide code completion for other projects, based on Ctags, SQLite and a small C++ parser for details that Ctags does not provide.
My favorite editor is jEdit, and I've created several plugins for jEdit based on the open source tagging tools Ctags and GNU Global (similar to CScope). Like CodeLite, my plugins store the information from Ctags in a database. (In my case, the plugins access the database using JDBC, and the user can use whichever JDBC database engine that he/she likes, the database parameters are configurable.)
Now I want to add context-sensitive features to jEdit, such as code-completion, "go to definition", "browse references", static call tree, etc. For this, I need a parser like the one that CodeLite has. Hence, I have some questions:
1. Is it okay for me to take the parser (and lexer) of CodeLite, and port them to ANTLR (a parser generator in Java)?
2. Can you give me a short explanation of how and when this parser is used? i.e. which source code sections are parsed by this parser, and when?
If it's okay for me to port the parser to ANTLR, and I will find this suitable for my purposes, there are a few features that we can later share. For example, ANTLR is a Java tool, but it can generate the parser for the grammar in multiple target language, so it can also generate a parser in C++. I want to use ANTLR because it's expected to be very efficient, and also it now has an "IDE" built around it, AntlrWorks, which provides a very user friendly way for editing the grammar and debugging it.
Also, I intend to combine several such tools together in order to provide features such as static call tree. A static call tree is based on two separate types of information provided by these tools: 1. Symbol references - direct references to a symbol are provided by Cscope or GNU Global. 2. Function definitions - provided by Ctags. To generate a call tree, I find the references to the function for which the tree is generated, then I find the functions containing these references, and so on. The CodeLite parser will be useful here for determining the context of the function for which the call tree is requested.
A while ago I thought to write a library for such features, perhaps in C++, that will be used as a standalone process by other tools (like Ctags is used now). This will remove the dependency on the language of the user-tool. But eventually, I created my jEdit plugins in Java and avoided this extra indirection. If found suitable, maybe I can go back to my original thought.
Thanks,
Shlomy
- eranif
- CodeLite Plugin
- Posts: 6375
- Joined: Wed Feb 06, 2008 9:29 pm
- Genuine User: Yes
- IDE Question: C++
- Contact:
Re: Using the parser in other open source projects
Hi Shlomi (שלום?)
CodeLite uses ctags to create a lookup table. The information is then stored in sqlite database for future references.
Now lets, say that you have such expression:
Now, ctags has no knowledge of grammar syntax, this is where the yacc parser kicks in - it is called by CodeLite to determine that should be displayed after the last '.'
In general there are 4 parser which I have created, each to solve small task.
The first parser 'scope parser': which determines the current scope:
Running the scope parser, will return 'foo::Class'
The second parser is the variable parser:
it parses text and searches for local variables in that scope. The output of this parser is std::list<Variable>
each variable class holds information such as: isTemplate, name, type, typeScope, pattern (the text where it was found) and more
this helps codelite to provide code completion for local variables, in the above code (wxString str; ...)
it will provide Variable for str, and set the type of str to be wxString, and the type scope is <global>.
The third parser is the expression parser, its task is to resolve expressions, for example:
CodeLite breaks this expression, into 'tokens' based on language operators (. -> ::) and passes them to the expression. It is mainly used to resolve casting, 'this' keyword and some other issues of C++.
So the logic executed by Codelite to resolve the above expression, is :
The grammar files are located at: /path/to/codelite/sources/trunk/CxxParser/
there is a workspace there, open it with CodeLite
This workspace generates the files needed for CodeLite IDE workspace
In codelite, whenever user hit Ctr-Space, the method 'WordCompletionCandidates' kicks in, I suggest you follow it ctags_manager.cpp
When ever user type -> . or :: the method 'AutoCompleteCandidates' kicks in
For more information you can try and ask at IRC channel:
channel name: #codelite
server: irc.freenode.net
This is a good idea, I too wanted to create the CodeCompletion as library (it still is an independent module of the IDE it has no build dependencies with other modules of the IDE - but the problem is that I fell in love with the idea of implementing the entire IDE to demonstrate it ...
Good luck,
Eran
NPsreinst1 wrote:First, thanks for the great CodeLite IDE
Indeed, you got it rightsreinst1 wrote:I read that CodeLite started as a library project to provide code completion for other projects, based on Ctags, SQLite and a small C++ parser for details that Ctags does not provide.
As long as you comply to the GPL license.sreinst1 wrote:1. Is it okay for me to take the parser (and lexer) of CodeLite, and port them to ANTLR (a parser generator in Java)?
Ok, in short:sreinst1 wrote:2. Can you give me a short explanation of how and when this parser is used? i.e. which source code sections are parsed by this parser, and when?
CodeLite uses ctags to create a lookup table. The information is then stored in sqlite database for future references.
Now lets, say that you have such expression:
Code: Select all
wxString str;
str.BeforeFirst(wxT(',')).Trim().Trim(false).
In general there are 4 parser which I have created, each to solve small task.
The first parser 'scope parser': which determines the current scope:
Code: Select all
namespace foo {
void Class::bar() {
| //<--- caret is here,
The second parser is the variable parser:
it parses text and searches for local variables in that scope. The output of this parser is std::list<Variable>
each variable class holds information such as: isTemplate, name, type, typeScope, pattern (the text where it was found) and more
this helps codelite to provide code completion for local variables, in the above code (wxString str; ...)
it will provide Variable for str, and set the type of str to be wxString, and the type scope is <global>.
The third parser is the expression parser, its task is to resolve expressions, for example:
Code: Select all
str.BeforeFirst(wxT(',')).Trim().Trim(false).
So the logic executed by Codelite to resolve the above expression, is :
Code: Select all
break expression to tokens
while more tokens
parse_expression( token )
if ok
resolve name to type
if ok
continue
end while
there is a workspace there, open it with CodeLite
This workspace generates the files needed for CodeLite IDE workspace
In codelite, whenever user hit Ctr-Space, the method 'WordCompletionCandidates' kicks in, I suggest you follow it ctags_manager.cpp
When ever user type -> . or :: the method 'AutoCompleteCandidates' kicks in
For more information you can try and ask at IRC channel:
channel name: #codelite
server: irc.freenode.net
Like I said, its OK for you to port it to ANTLR (I know this tool ) I wrote my parser in YACC and I think ANTLR can import YACC files.sreinst1 wrote:If it's okay for me to port the parser to ANTLR, and I will find this suitable for my purposes, there are a few features that we can later share. For example, ANTLR is a Java tool, but it can generate the parser for the grammar in multiple target language, so it can also generate a parser in C++. I want to use ANTLR because it's expected to be very efficient, and also it now has an "IDE" built around it, AntlrWorks, which provides a very user friendly way for editing the grammar and debugging it.
cscope is a poor tool for C++. I added a plugin for cscope only because I had a FR and it is useful for C. Have a look of how I implemented refactoring in the code - and you see that Codelite can be very accuratesreinst1 wrote:1. Symbol references - direct references to a symbol are provided by Cscope or GNU Global
sreinst1 wrote:A while ago I thought to write a library for such features, perhaps in C++, that will be used as a standalone process by other tools (like Ctags is used now). This will remove the dependency on the language of the user-tool. But eventually, I created my jEdit plugins in Java and avoided this extra indirection. If found suitable, maybe I can go back to my original thought.
This is a good idea, I too wanted to create the CodeCompletion as library (it still is an independent module of the IDE it has no build dependencies with other modules of the IDE - but the problem is that I fell in love with the idea of implementing the entire IDE to demonstrate it ...
Good luck,
Eran
Make sure you have read the HOW TO POST thread
-
- CodeLite Enthusiast
- Posts: 19
- Joined: Tue Sep 23, 2008 11:06 am
- Contact:
Re: Using the parser in other open source projects
Hi,
שלום is good
First, thanks a lot for the detailed reply. I actually found out most of this information from the CodeLite source code, but there are some things missing:
1. Ctags does not tell you the types of variables at all. So, if you have something like:
void C::f() {
a. <-- caret is here
}
You need to find the type of C::a, which requires parsing the declaration of class C (or at least a part of it). Are the parsers invoked only when the user requests one of the actions that require them (e.g. completion, refactoring)? Or do you, for this reason, parse all files when they are added to the project?
2. When you get information from the parser, do you add it to the database and keep it up-to-date? Or otherwise, the parser will have to be invoked again for the same code the next time the user performs a context-sensitive action.
3. I had an idea to simplify things: Assuming any code in C++ is inside a function or a macro, I can rely on the output of Ctags for the file containing the code with the current expression, and find the function / macro that's defined in the highest line number that is smaller or equal to the current expression. Is that not good enough?
BTW, since jEdit is a text editor supporting a very wide variety of languages, and since the tools I rely on (Ctags, GNU Global) support several languages as well, I had in mind to design these features to also support a wide variety of languages, and introduce a model (a set of interfaces) that can be implemented for several target languages. But this is starting to look like Eclipse, and one of the things I want to avoid is the slowness, so I may give this up and create something tailored for C/C++, since this is the only language for which I need this in jEdit.
Another BTW - Using Ctags, I created a class browser plugin, and an outline plugin - which shows you the tags in the current buffer. In contrast with the outline feature provided in many other IDEs, I made it completely configurable. You can configure the sorting, filtering and grouping of tags, e.g. you can group all functions under 'Functions', or group all functions of some class under the class name. You can take some ideas from there for CodeLite. Maybe I can change this to some C++ library that you will be able to use as well.
Shlomy
שלום is good
First, thanks a lot for the detailed reply. I actually found out most of this information from the CodeLite source code, but there are some things missing:
1. Ctags does not tell you the types of variables at all. So, if you have something like:
void C::f() {
a. <-- caret is here
}
You need to find the type of C::a, which requires parsing the declaration of class C (or at least a part of it). Are the parsers invoked only when the user requests one of the actions that require them (e.g. completion, refactoring)? Or do you, for this reason, parse all files when they are added to the project?
2. When you get information from the parser, do you add it to the database and keep it up-to-date? Or otherwise, the parser will have to be invoked again for the same code the next time the user performs a context-sensitive action.
3. I had an idea to simplify things: Assuming any code in C++ is inside a function or a macro, I can rely on the output of Ctags for the file containing the code with the current expression, and find the function / macro that's defined in the highest line number that is smaller or equal to the current expression. Is that not good enough?
BTW, since jEdit is a text editor supporting a very wide variety of languages, and since the tools I rely on (Ctags, GNU Global) support several languages as well, I had in mind to design these features to also support a wide variety of languages, and introduce a model (a set of interfaces) that can be implemented for several target languages. But this is starting to look like Eclipse, and one of the things I want to avoid is the slowness, so I may give this up and create something tailored for C/C++, since this is the only language for which I need this in jEdit.
Another BTW - Using Ctags, I created a class browser plugin, and an outline plugin - which shows you the tags in the current buffer. In contrast with the outline feature provided in many other IDEs, I made it completely configurable. You can configure the sorting, filtering and grouping of tags, e.g. you can group all functions under 'Functions', or group all functions of some class under the class name. You can take some ideas from there for CodeLite. Maybe I can change this to some C++ library that you will be able to use as well.
Shlomy
- eranif
- CodeLite Plugin
- Posts: 6375
- Joined: Wed Feb 06, 2008 9:29 pm
- Genuine User: Yes
- IDE Question: C++
- Contact:
Re: Using the parser in other open source projects
CodeLite invoke ctags when:sreinst1 wrote:1. Ctags does not tell you the types of variables at all. So, if you have something like:
void C::f() {
a. <-- caret is here
}
You need to find the type of C::a, which requires parsing the declaration of class C (or at least a part of it). Are the parsers invoked only when the user requests one of the actions that require them (e.g. completion, refactoring)? Or do you, for this reason, parse all files when they are added to the project?
- User saves file
- Files are added to the project
- Explicit request (Retag file/Workspace/project)
so this data is kept up-to-date in the database.
The yacc parser is invoked every time (and it is extremely fast). Since ctags does not provide the type of the variable nor the return value of a function, I simply parses the 'pattern' provided by ctags for this. Have a look at function:
Code: Select all
bool Language::VariableFromPattern(const wxString &in, const wxString &name, Variable &var)
Code: Select all
bool Language::FunctionFromPattern(const wxString &in, clFunction &foo)
No, the parser is invoked again (only the Yacc parser) - like I said above, the ctags is invoked whenever a modified file is being saved.sreinst1 wrote:2. When you get information from the parser, do you add it to the database and keep it up-to-date? Or otherwise, the parser will have to be invoked again for the same code the next time the user performs a context-sensitive action.
I am already dong such optimizations , have a look at:sreinst1 wrote:3. I had an idea to simplify things: Assuming any code in C++ is inside a function or a macro, I can rely on the output of Ctags for the file containing the code with the current expression, and find the function / macro that's defined in the highest line number that is smaller or equal to the current expression. Is that not good enough?
Code: Select all
TagEntryPtr TagsManager::FirstFunctionOfFile(const wxFileName &fileName)
Code: Select all
TagEntryPtr TagsManager::FirstScopeOfFile(const wxFileName &fileName)
Code: Select all
TagEntryPtr TagsManager::FunctionFromFileLine(const wxFileName &fileName, int lineno)
I decided the CodeLite will remain C/C++ IDE, I dont want to become CodeBlocks/eclipse - I prefer to remain "lite", focus on one thing and it well.sreinst1 wrote:BTW, since jEdit is a text editor supporting a very wide variety of languages, and since the tools I rely on (Ctags, GNU Global) support several languages as well, I had in mind to design these features to also support a wide variety of languages, and introduce a model (a set of interfaces) that can be implemented for several target languages. But this is starting to look like Eclipse, and one of the things I want to avoid is the slowness, so I may give this up and create something tailored for C/C++, since this is the only language for which I need this in jEdit.
CodeLite also has 'Outline Feature' - not configurable as you describe it, but it by defautl groups 'functions' and 'prototype', globals, macros and other.sreinst1 wrote:Another BTW - Using Ctags, I created a class browser plugin, and an outline plugin - which shows you the tags in the current buffer. In contrast with the outline feature provided in many other IDEs, I made it completely configurable. You can configure the sorting, filtering and grouping of tags, e.g. you can group all functions under 'Functions', or group all functions of some class under the class name. You can take some ideas from there for CodeLite. Maybe I can change this to some C++ library that you will be able to use as well.
If you are using Codelite, hit Ctrl-Shift-O to see what I mean, or switch to the 'Outline' tab.
In addition I also added other navigation methods: Open resource (Ctrl-Shift-R) or Open type (Ctrl-Shift-T)
Eran
Make sure you have read the HOW TO POST thread
-
- CodeLite Enthusiast
- Posts: 19
- Joined: Tue Sep 23, 2008 11:06 am
- Contact:
Re: Using the parser in other open source projects
The yacc parser is invoked every time (and it is extremely fast). Since ctags does not provide the type of the variable nor the return value of a function, I simply parses the 'pattern' provided by ctags for this. Have a look at function:
and:
>> Parsing only the pattern provided by Ctags may not be enough, many users use a coding convention where the returned type is specified on the line preceding the function header, and as far as I know the Ctags pattern is only for the header line itself.
BTW, I noticed also that CodeLite does not take into account levels of dereferencing. E.g. if I declare a variable 'a' or type 'A', then the following will generate the same code completion options "a.", "a->" (even though the second should not). I don't know (didn't check) what it does when both can be used and have a different type as context (such as when the class has the operation "->" defined).
Code: Select all
bool Language::VariableFromPattern(const wxString &in, const wxString &name, Variable &var)
Code: Select all
bool Language::FunctionFromPattern(const wxString &in, clFunction &foo)
BTW, I noticed also that CodeLite does not take into account levels of dereferencing. E.g. if I declare a variable 'a' or type 'A', then the following will generate the same code completion options "a.", "a->" (even though the second should not). I don't know (didn't check) what it does when both can be used and have a different type as context (such as when the class has the operation "->" defined).
- eranif
- CodeLite Plugin
- Posts: 6375
- Joined: Wed Feb 06, 2008 9:29 pm
- Genuine User: Yes
- IDE Question: C++
- Contact:
Re: Using the parser in other open source projects
And you are correct, in this case the code completion will fail, but it will work for most cases. Btw, Codelite uses its own hacked version of ctags because of the same problem u described here: template are often written like this:eranif wrote:>> Parsing only the pattern provided by Ctags may not be enough, many users use a coding convention where the returned type is specified on the line preceding the function header, and as far as I know the Ctags pattern is only for the header line itself.
Code: Select all
template <class T>
class MyT {
};
Code: Select all
template <class T>
class MyT {
};
Code: Select all
template <class T> class MyT {};
Yes, it does not. and this is on purpose for cases where the code completion will fail to identify it correctly. So I prefer to left it in the hands of the usersreinst1 wrote:BTW, I noticed also that CodeLite does not take into account levels of dereferencing.
Btw, this information is available in the Variable class, there is a member name 'm_isPtr'
However, I did improve the CodeCompletion, so in case user types '::' as in:
MyClass:: CodeLite will display anything of type struct, class, union, enums typedefs of scope myClass, while typing:
MyClass cls; cls. codelite will only display methods / members
Eran
Make sure you have read the HOW TO POST thread
-
- CodeLite Enthusiast
- Posts: 19
- Joined: Tue Sep 23, 2008 11:06 am
- Contact:
Re: Using the parser in other open source projects
Note that in general, it is not enough to know whether a variable is a pointer or not. In general, you need to save the level of referencing (e.g. C ***p is not the same as C *p). It is especially important when operations such as "->" and "." are overridden by the relevant class. But it's a good enough approximation for most cases. It just depends on how accurate you want it to be. In the case of jEdit, I'm not even sure yet that I will go your way, I might even resolve to using regular expressions completely, avoiding the extra accuracy you get by your grammar. BTW, I tried to use CodeLite a bit, and it feels quite "heavy". Whenever I save a file, it takes a second or two until I get control back, I assume this is due to the invocation of Ctags and its processing, which I suggest to put in the background (if not already).eranif wrote:Yes, it does not. and this is on purpose for cases where the code completion will fail to identify it correctly. So I prefer to left it in the hands of the usersreinst1 wrote:BTW, I noticed also that CodeLite does not take into account levels of dereferencing.
Btw, this information is available in the Variable class, there is a member name 'm_isPtr'
However, I did improve the CodeCompletion, so in case user types '::' as in:
MyClass:: CodeLite will display anything of type struct, class, union, enums typedefs of scope myClass, while typing:
MyClass cls; cls. codelite will only display methods / members
Eran
- eranif
- CodeLite Plugin
- Posts: 6375
- Joined: Wed Feb 06, 2008 9:29 pm
- Genuine User: Yes
- IDE Question: C++
- Contact:
Re: Using the parser in other open source projects
Almost an instant here, I guess it depends on the machine configuration you havesreinst1 wrote:BTW, I tried to use CodeLite a bit, and it feels quite "heavy"
Which OS?
Revision?
What is your system configuration?
and how large is the size of the file
Whenever you save a file, a message is sent to the background thread which in turn "talks" with ctags and passes the file name to be parsed.sreinst1 wrote:which I suggest to put in the background (if not already).
The processing of the output is also done via the the secondary thread. The only thing that needs the main thread - is updating the tree "outline" view.
Eran
Make sure you have read the HOW TO POST thread
-
- CodeLite Enthusiast
- Posts: 19
- Joined: Tue Sep 23, 2008 11:06 am
- Contact:
Re: Using the parser in other open source projects
Doesn't the update of the outline view depend on the process of Ctags and the database update? They must be completed before you can update the outline...eranif wrote: The processing of the output is also done via the the secondary thread. The only thing that needs the main thread - is updating the tree "outline" view.
Eran
My OS: WinXP Professional
CPU: 3.0 Ghz, single core
Revision: Latest one available from the CodeLite site
File size: Pretty small (~200 lines)
- eranif
- CodeLite Plugin
- Posts: 6375
- Joined: Wed Feb 06, 2008 9:29 pm
- Genuine User: Yes
- IDE Question: C++
- Contact:
Re: Using the parser in other open source projects
It does.sreinst1 wrote:Doesn't the update of the outline view depend on the process of Ctags and the database update? They must be completed before you can update the outline...
This is what the worker thread does:
- Invoke ctags for the modified file
- Reads ctags output (ctags is running as daemon process, --filter=yes option of ctags)
- Performs a 'diff' operation between the tags that exists in the database for this file, and the new ones received from ctags.
- update the database with the new tags
- Send 3 events to the outline view:
- SymbolDelete - the event also contains list of tags that should be removed from the outline view
- SymbolUpdate - the event also contains list of entries that their display/icon should be modified
- SymbolNew - new items that should be added to the tree
Probably this is why you see this delay. I will look into the code to see where it can be optimized..sreinst1 wrote:My OS: WinXP Professional
CPU: 3.0 Ghz, single core
Eran
Make sure you have read the HOW TO POST thread