As an ArgoUML contributor I'm going to blog my activities here, so that they may draw interest by other developers or help other developers when doing tasks similar to what I've done. AND(!) the grand vision that makes an Argonaut what he is, TO THRIVE IN THE BIG DANGEROUS WORLD, TAKING THE Argo TO A GOOD SHORE ;-))

Sunday, November 15, 2009


From all the ideas my idiot's head has had in the past few years, one has been buzzing very strongly and that is medeia. The name Medea comes from the Argonautica book, so, it is related to ArgoUML and it is the name of a woman that helps the Argonauts and Jason in particular to get the Golden Fleece. She knows one thing or two about magic and uses it. Medeia is Medea in Portuguese related to mediate. Well, the name points a bit to what the idea is about...

medeia is a good name because it is similar to idea in Portuguese, which is written ideia.

As the responsible and main contributor to the ArgoUML C++ module, in the past 5 years, I've seen that this – making a language module – is a hard task. There is a lot that must be considered to provide production ready code generator, reverse engineering, notation and round the horn language module(s). Normally the effort is single handed and when you reach the end, there will always be loopholes and needs to extend it. But the worst part is how much of reinventing the wheel goes into it. Lets put the things in a clear way, by enumerating each part of a typical language support module:

  • UML Profile for the language:
    • You need a core language profile with stereotypes that may be used for tagging UML model elements, in order to apply tagged values to the model elements.
    • To use the profile in a programmatic way you need some wrapping classes, such as the class ProfileCpp in the C++ module.
    • Potentially you would also want to place in the module UML profiles for libraries that are used in plenty of projects that use the language – in the case of C++ we may have boost.

  • Language generator:
    • Use of a template engine for formatting the generated code. (For an example see StringTemplate.)
    • A code generator, which maps the UML model into an Abstract Syntax Tree (AST) which would be handed over to the template engine.
    • A way not to trash the code that already exist, like the current protected sections in the C++ generator.
    • Some form of mapping the documentation in the UML model into the language standard(s) format(s). And you need this to be customizable because projects typically develop specific variants of the documentation standards or there is some company standard that they must use. I would say that this fits more or less in the same problem/solution as the code generator being made by two parts – a mapping from UML to an AST and a template engine which is fed by the AST – but, I think that here there might be more needs for giving customization options for enhancing the AST generator.
    • Potentially there is the need to create code from specific UML diagrams or the model elements beneath specific diagrams. For instance, generating the classes that implement the state design pattern for a state machine (check also this nice article by Robert C. Martin (aka Uncle Bob) on finite state machines and UML.
      As a side note, you may not need to do this if you use the ArgoUML Pattern-Wizard module.

  • Language reverse engineering:
    • There must be a language grammar, typically, in ArgoUML, this is based in ANTLR and it is common that there is something for your target language.
    • With the language grammar you can generate a Parser, it might even be possible that an AST walker is generated for you, but, your mileage may vary and the available Parser and AST walker might not be fully customized to your needs. For instance, I was capable of getting a ANTLR C++ grammar ported to Java, but, this wasn't prepared to make an AST and I failed to change it into one. It has some bugs which up-to-now I didn't solve. Finally, I'm the one that must keep it evolving with the original grammar which was made for C++. Even the original grammar is now unsupported, which is problematic with the evolution of the C++ standard to C++ 0x.
    • Even with the AST walker there is the need to perform the mapping from language constructs to UML model elements. This mapping must be performed in a way that doesn't create duplicated model elements when the same source files are imported multiple times.
    • The grammars are normally created for blue sky situations where you have all the information, the source code is well formed, etc, etc. The reality is that the reverse engineering modules must be designed to handle malformed sources and work with incomplete information – i.e., partial imports.
    • In C++ you need a C preprocessor. Fortunately there is one open source C preprocessor implemented in Java which the C++ module now uses.
    • Users normally aren't satisfied by simple import and they normally want that the reverse engineering creates some simple UML diagrams from code. The Java module has this.
    • Finally, the killer feature of a UML reverse engineering module is for a user to point it to a build definition file – for instance a make file for C++ or an Ant build.xml file – and the reverse engineering module would interpret it and import the underlying project fully into a UML model.

  • Language notation – this enables the user to edit UML with the language notation.
    • There is a simple way to create a read-only notation for the language, which is basically to use the code generator in a partial way. That is what the C++ module does in the notation package.
    • The complex part is to create a parser for the notation. The best example is to check the parser that Michiel van der Wulp developed for UML notation. Neither C++ nor Java have a notation parser. If there is such a parser for the language notation it would enable fully editable UML models in the programming language.

  • Round the horn engineering for the language.
    • This is the Lucy in the sky with diamonds scenario. You edit the model and say update code and the code is magically updated without trashing the existing code and potentially adapting the dependent code to the changes you made. Even better, it would understand enough of the build definition to add new files to it or remove unused ones.
      My knowledge about tools that support this is that Together did a good job and that Rational Rose for M$ VC++ did a bad job at it. It is a long time since I tried tools such as these in a professional setting.

  • Language module settings user interface – finally, for each of these things you need to consider how the user is going to interact with it and configure it to his needs.
    • A nice GUI for some things, like the generator options concerning the mapping from UML model elements into the language constructs (e.g., see SettingsTabCpp, but, this has some mixture of several parts of the generator).
    • The reverse engineering also has a GUI which is placed in the dialog that is shown to the user when importing code. Currently the C++ reverse engineering module has no such GUI, but, you can see an example in the Java module – JavaImportSettings.
    • Then there are parts of the UI that may have the form of a template file, but, this is UI and, as such, the template engine must deal with the potential errors introduced by changes the users did.
    • All of this and more must somehow be documented in the user manual, which potentially is translated into other languages...

Description of medeia

So, enough of complaining myself about the fun, but, hard and difficult to finish work of developing language support for ArgoUML in the classic way. Lets now delve into what medeia is!

The medeia idea came to me a long time ago, but, in a gradual way. First I thought about a different solution to develop the reveng C++ module, which was to base it in the reuse of libraries made by the Eclipse CDT project. Then, the obvious stroke my mind and I thought of this as a potential solution for several other language modules, just as the Eclipse guys add support for more and more languages, the ArgoUML could grab it and reuse it. Recently, with the evolution of polyglot software development and the dynamic language boom, both Eclipse and NetBeans are adding support for multiple and interesting languages.

Then, with the Argoeclipse project, I thought that this would be a perfect opportunity to revisit this, but, instead of reusing libraries by placing them in the ArgoUML modules, we could just mediate the things that the existing Eclipse environment knows about the projects the user has in his Eclipse workspace and map those into ArgoUML and back, reusing for this the whole infrastructure that Eclipse has. Note that this is also possible for an eventual Argonetbeans.

Finally, while working at SISCOG and understanding the way Emacs connects via SLIME or ELI to a live Lisp and how that enables you to discover on a live and fully consistent software program about its static structure, I came to the conclusion that medeia could be a bit different for languages that have reflexion built-in (Java included)! You just need to place a little piece of software within the program you want to know about and inquire it from the ArgoUML side. No need to parse source files, understand build systems, etc, etc. Well, you still have some work for generating code and to provide a language notation, but, the medeia idea solves a large part of the problem of reverse engineering modules for a language. It creates some new, though...

Classical reveng module medeia reveng module
Language grammar, AST walker, parser, etc Interface with the IDE (for languages that don't have powerful enough reflexion facilities, such as C++, C, Fortran and etc) or with the live program (for languages that do have such reflexion facilities), that communicates with the ArgoUML module or with Argoeclipse or with Argonetbeans or with Argowhatever.
Mapping from language to UML. Ditto, also needed here.
Robust software to deal with incomplete information or malformed source code. Software that must support some way to filter the information that matters, maybe by allowing the user to specify that the import must be from specific packages or / and by allowing to get the information by specifying incomplete names – similar to IDEs' find type.
C preprocessor or similar things, largely dependent on language quirks. Nothing like that needed on this side, thanks :-)
UML diagrams auto-magically created by the import sources process. Ditto in here too, but, with the difference that probably the user might need to ask for diagram creation and update or offer some settings and try to do the correct action for different user actions.
Understanding build definitions to import full projects with the minimum work by the user. No need for this just for reverse engineering, but, in complex multiple-project scenarios, maybe the users want that the correct UML elements go into the correct UML models... The filtering described above might be a good way for certain languages, but, for languages that simply support namespaces or that are used in a flat package way, like Lisp, this wouldn't work so nicely.

Do you like this idea? Have you seen it applied to other UML tools or to solve similar problems? I recall that this is already used in the context of Emacs and Lisp / Scheme / Clojure, with the SLIME project or others similar to it. But for a UML tool you normally go the way of the classical reverse engineering design described above. Well, maybe Together worked this way when integrated within Eclipse...

1 comment:

lg said...

I like it, I think it is a very good idea!

Reader Shared items