As an ArgoUML contributor I'm going to blog my activities here, so that they may draw interest by other developers or help other developers when doing tasks similar to what I've done. AND(!) the grand vision that makes an Argonaut what he is, TO THRIVE IN THE BIG DANGEROUS WORLD, TAKING THE Argo TO A GOOD SHORE ;-))

Wednesday, March 30, 2005

half way to drop 1


After looking a bit into the grammar I realized that it is only parsing the information! It doesn't store it in a AST for later processing. So, this means that currently there isn't parsed information! It isn't this that would make me throw away the current grammar. The main problem is parsing the information and that is being done in a very good way, as is shown in the current working tests!

Just for curiosity I'll look at the original C++ grammar by David Wigg et all... It uses the a similar scheme to the one used in the java grammar. The parser makes calls at specific places in the grammar, in order to have the information processed along with the parsing. The C++ grammar defines abstract methods which are implemented in some specialized parser class. For java reveng, the Modeller class gets called by the parser.

It is interesting to see that neither uses the AST functionalities of ANTLR. I wonder why? Maybe its because these grammars were implemented in a pre-AST time...

After looking into the grammar I now understand that I don't need to remove the pre-processor directives in quadratic.i. So, I'm going to use the original version that comes with the "official" C++ grammar.

I should consider ways to have the official C++ grammar reused instead of having a specific version that will be very difficult to integrate improvements when they are introduced in the David Wigg's grammar. A possible way, if they are interested would be to have some special tags that would be pre-processed previous to the grammar being handed over to the ANTLR for code generation. Maybe even ANTLR makes this task easy! It would be nice to have a unique grammar that would be used with C++, java and python programs! I must consider this when I send the grammar to the ANTLR list.

It is nice to see that the ANTLR project also makes a big effort to be multi-language. The examples provided in the distribution show that the type of multi-language support buit-in in one grammar is possible. In the future I must check how this is being done and try to have the C++ grammar this way!

Fixing the C++ grammar

I think I know why the quadratic.i parsing fails! The version of ANTLR being used in ArgoUML is 2.7.2, while the version that David Wigg was using when he commited the last version of the grammar was 2.7.3. Current version is 2.7.5. So, the way to go would be to at least try the ported grammar with 2.7.3 and check if that is the problem...

Some very interesting news! There exists a grammar for C pre-processor for java programs in ANTLR 2.7.5 distribution. This fits perfectly my needs! :-))


The module model is now updated to show the use of the ANTLR C++ grammar and commited. I'll proceed to the next step.


I've been bitten by the infamous zargo corruption bug while editing the C++ module model. Fortunately, I committed the previous version of the model in CVS, and this way I may revert to that version. The ArgoUML version I was using was 0.17.2, which is not advisable for production, as you may see in this report of mine...

Since I'm a developer for ArgoUML, I thought I could look into the log and try to figure out what was the relevant part to report the problem back to ArgoUML. Unfortunately, the log isn't configured to save the date as part of the traces... This I think qualifies as a bug and therefore is a possible fix for me to contribute.

So, I have these two identified problems in ArgoUML, which I must obviously help in being solved. I'll create issues and try to fix them.

  • log4j configurations should have time with traces – This is very simple, I just need to add "%d" to the pattern in the log4j configuration files, which are in /argouml/src_new/org/argouml/resource. For configurations that are for ConsoleAppender, the date isn't relevant, since the user will see the output in the console and don't need to distinguish between several working sessions in a log file. The main changes are in default.lcf and full.lcf.
  • ArgoUML must support by default UML foundation types


The plan must be updated to be more flexible. I already know that I have a parser that is capable of parsing simple files. This is a very important first step and I don't want to loose momentum by attempting to have a perfect parser that correctly parses more complex files, like quadratic.i. So, instead of spending all effort in Fixing the C++ grammar, I'll move forward into using the information parsed for some useful thing. To enable this I'm going to make some analysis and design on how can I do this. This is still part of the Fix the C++ grammar step of the plan, although the name won't indicate this. I would call it now as a different step called analysis of how to use the parsed information.

For this I may check again what the java reveng does and maybe model it a bit in the C++ model... After that I'll make a new test case that explores the information contained in a parsed file. When this is done I may make the model of the C++ grammar and parser.

3rd plan for C++ reveng drop 1

  1. Learn how to make the ANTLR parser for debugging and build one. Estimated Effort (EE) = 4 Mh; Short Name (SN) – ANTLR parser 4 debugging

    2005-02-25 DONE – Although not actually a fully debug enabled parser, but, using the trace capabilities of the ANTLR generator. Note that this makes the parsing much slower and is only useable with small files! Actual Effort (AE) = 2:36

  2. Debug the C++ grammar and make it pass the tests. EE = 20 Mh; SN – Fix the C++ grammar

    2005-03-07 PARTIAL – It parses cleanly a simple class and a code snippet with which I was attempting to reproduce the current error in parsing of quadratic.i. AE = 2:51

  3. Commit the result of this work and send it to Yolanda. Update the issue. EE 3 Mh; SN – Commit, Yolanda and issue

    2005-03-09 DONE – I commited the work in progress and updated the issue. Due to the release of a stable version of ArgoUML the work was commited in branch cpp_reveng_work_while_0_18_release. I only sent to Yolanda an e-mail of thanks. I'll send her the version that will be sent to the ANTLR list. AE = 1:36

  4. 2nd re-planning of C++ reveng drop 1 EE 2 Mh; SN – 2nd re-planning of C++ reveng drop 1

    2005-03-12 DONE – Re-planned, updated the ProcessDashboard phases and documented it all here ;-). AE = 1:39

  5. Update the model to reflect the new package and the grammar use. EE = 2 Mh; SN – Module model update for the C++ grammar

    2005-03-16 DONE – Updated and commited. AE = 2:14

  6. Make tests that show how the parsed information may be used for reveng. If some issue exist, analyse how it is done in java reveng and fix the grammar as needed. This includes creating, or improving the current, test cases, which prove how the parsed information may be used for reveng. EE 5 Mh; SN – Prove that parsed information is useful for reveng

  7. Model the implementation of org.argouml.application.api.PluggableImport interface in the C++ reveng module. Generate the realization of the designed classes. If there are issues in the generation, report them in issuezilla. EE 5 Mh; SN – Model and generate the realization of the PluggableImport interface

  8. Close the circle, by making the module support reveng of preprocessed C++ files. EE 15 Mh; SN – Module support of reveng of pre-processed C++

  9. Send a working vanilla version of the grammar to the ANTLR list and announce its use within the ArgoUML project. Provide feedback as appropriate. Automate the adaption of the files in the module build script. EE 5 Mh; SN – Send grammar 2 ANTLR list

  10. Enjoy and celebrate the achievement! Go back to planning next drops. EE 4 Mh; SN – Plan next drops

Why do I spend so much time planning? Why do I measure with such care the time I spend in my hobby? Why do I document the whole stuff so meticulously?

Friday, March 11, 2005

First C++ parser commit and type bug


With the upcoming 0.18 release of ArgoUML, I looked into the issues reported and fixed by Daniele Tamino, in order to verify the fixes. It is all working fine and therefore many of the issues are now verified.

I think that I will check tomorrow a way to commit my current work for reveng in a branch... DONE Sent also an e-mail of thanks to Yolanda.


The quadratic.i test file can't be taken as made available from the original ANTLR C++ grammar (the one for C++ output). It contains pre-processor lines, which in the TestCppGrammar test case aren't removed before handing it to the lexer. It isn't a problem. I removed these lines with a simple python script and I'm now debugging the grammar with this modified file. It contains some problems, which I think I will handle by debugging the grammar in additional test cases with just the needed code snippets to make it fail.


The parser must really be built for debug to enable debugging! This parser isn't just my class, it is the antlr.Parser class! So, I'm going to use just the ANTLR generation options for tracing: -traceParser and -traceLexer. It is working, but, I'm going to start with the SimpleClass.cpp test first because the quadratic.i example from the ANTLR C++ grammar is way too complicated to start with.

The grammar works! :-)) It parsed SimpleClass.cpp as soon as I fixed it (removed the boolean)! I must test with quadratic.i, since previously the Ant target was failing because I wasn't copying the files to the build dir 8-!

New Bug: C++ generator is generating code with java types.

I've noticed a new bug in the C++ generator. ArgoUML enables selection of some built-in types according to the java language. One of these is the boolean, which is generated as boolean instead of bool. I bet this is happening for other types as well...

What should be happening is that ArgoUML enables selection of the standard UML types:

  • Boolean – it does enable the java.lang.Boolean, but, this still is java specific.
  • Integer – same as for Boolean, i.e., java.lang.Integer
  • ... all the types are java!!! No UML types?!

I must create an issue for this... Actually two, one for the global ArgoUML to enable UML types by default and another for the C++ module to give warnings when java specific types are used and to translate these types to C++ types. It should also enable the use of specific C++ types, adding them, just as java does. This might be done with a XMI file made available to users for importing or by some option, which make available the use of either UML types, java types, C++ types or some other...

Another possible addition is to have a critic that warns users about the possibility of using java types in non-java specific models. Maybe having a model nature as a tag would be a nice thing that would disable such critic to be turned on. This model nature could later be extended so that only generic rules and specific language rules to be turned on. This might be applied also to packages etc.

Reader Shared items