Saturday, April 05, 2008

Resolving and persisting profile references

In the context of issue 4946 (Loading project which references non-default profile doesn't work) I needed to make some design changes to the profile sub-system and to the MDR implementation of the model sub-system of ArgoUML. At first I used what I now see as a brute force approach, banging the existing code to try to make the problem go away. I'm glad it didn't worked and I started thinking and designing my way out of this priority 1 bug.

Basically the problem aroused when we tried to finish the profile sub-system to have a developer release of ArgoUML, the release 0.25.5. Originally the UML profile for C++ was contained in the ArgoUML core. There were two types of profiles:

  • ArgoUML core profiles – these are XMI files contained in ArgoUML, being physically part of the argouml.jar.
  • User defined profiles – these are XMI files saved by the user, which he intends to use as his own profiles, normally for reusing in several models.

It was agreed that part of the work would be to enable the language modules to contain the corresponding profiles, and as such I tried to put the C++ profile in the ArgoUML C++ module. So, ArgoUML would now have three types of profiles:

  • code profiles;
  • module profiles;
  • user defined profiles.

Now I need to explain the idea of resolving references when loading XMI files...

A XMI file is a XML file. Being so, it is possible to make a reference from one XMI file to another and this is how in the future ArgoUML you can have a model and several profiles referenced from this model. But, the references take the form of URL#ID, being the URL the reference to a different XMI file where there must exist a definition of a model element with the identifier ID.

To enable offline work in the case of core and module profiles and flexible user defined profile directories locations in the case of user defined profiles, you need to add a midle man resolving these beautiful URLs to system paths - enter the XmiReferenceResolverImpl.

The XmiReferenceResolverImpl was based on the AndroMDA code that did the same thing. It has some complex code that is able to resolve a reference from the net, local disk or even in the classpath. But, the resolving of a reference to a profile was hard coded, being the base URL and the base system path constants of XmiReferenceResolverImpl.

So, when the C++ profile was moved, the reference resolving didn't worked any longer for it since its system reference was now different. It broke the initial solution of having the reference resolving somehow hard coded in the MDR model implementation.

I tried very much to make it work again, doing some more hard coding in the profile loading side and in the MDR implementation side:

  1. [2008-01-28 to 2008-02-05] Hack, hack, debug, debug, and I made the C++ profile module work :-)
  2. [2008-02-26 to 2008-03-04] Hack, hack, debug, debug, hack, hack and the user profiles work too :-))
  3. [2008-03-??] But wait, my automated test for the C++ profile persistency is now failing :-(
  4. [2008-03-?? to 2008-03-23] Debug, debuug, hack, haaack...
  5. [2008-03-24] Stop! Think a bit, and think a bit more and... This design for resolving references is broken.

So, I went back to the drawing board and started thinking a bit on what is required to enable the reference resolving that we needed. There were some lessons learned:

  1. the XmiReferenceResolverImpl must not have hard coded base URLs and base system paths to resolve the profiles references;
  2. when the XMI for a profile is read, the XmiReferenceResolverImpl must replace the given system reference with a public reference which is handed to it;
  3. the best place to know about what are the system and public references for a given XMI is the place where the call to load the XMI is done.

Enter the ProfileReference, although, as Marcos Aurélio commented, it would better to be called ModelReference.

ProfileReference class diagram.

Picture 7ProfileReference class diagram.

As seen, to enable DRY, I added the CoreProfileReference and the UserProfileReference, but, the language modules will in principle use directly the ProfileReference, or define their own XxxProfileReference. The ProfileModelLoader.loadModel(String path) method is deprecated and the ProfileModelLoader.loadModel(ProfileReference reference) replaces it.

But, now, how do I get all these neat references into the realm of XmiReferenceResolverImpl? Well, some more design and I hope the ArgoUML powered sequence diagram of the interesting part of loading a profile XMI explains it.

Profile loading sequence diagram.

Picture 7 – profile loading sequence diagram.

As with many sequence diagrams, I only kept the important steps, but, one significant is that there is a map of public to system references involved, which is kept at the MDRModelImplementation level. And this is important, since the XmiReferenceResolverImpl and its creator, the XmiReaderImpl are short lived - they are only kept around while the XMI is being loaded.

So, when I find myself working very hard at the debugger and hacking cycle I will try to stop and think. I recall already having stated this to myself some time ago, but, sometimes I forget this kind of things for which there is nothing like a bump with the head (or several as in this case) for one to remember the basics again ;-)

Labels: ,

Saturday, March 08, 2008

ArgoUML's repository restructuring

ArgoUML core is undertaking a repository restructuring and I'm working in adapting the argouml-cpp build to the new structure.Issue 4625 is where the requirements for the changes are documented and there has been some dev mailing list activity on this. Linus Tolke is heading the effort.

We want to keep two build mechanisms:

  • Plain Ant build via command line and without eclipse.
  • eclipse based build, being the idea for this to make easier for new contributors to start.

I have been adapting the Ant build file to the new structure, but, I had some problems with making it work for both purposes. The big issue is to generate the source files from the ANTLR grammar. In the end we figured out a good way to do it and it is now documented in the ArgoUML cookbook.

Labels: ,

Distributed version control systems

Recently I had to use Mercurial to checkout the NetBeans sources. Well, since the instructions were for the whole NetBeans repository, I followed them and I have a local checkout of it. Now NetBeans is big, so, you could think it would take me some hours to get the whole thing, even more, if you consider that it clones the whole repository with the full history. No, it was surprisingly fast and its command line interface is very easy to use.

Alas, I have currently no project where this or other distributed SCMs are used, but, when I get a break from ArgoUML and start experimenting with open source projects from common-lisp.net I'll have a good change that they'll be darcs based.

Labels: ,

Debugging model-mdr

In the past 2 months I have been in and out working with other persons to fix the problems in the ArgoUML's subsystem model-mdr (see issue #4946). This is a core part of the ArgoUML implementation and it basically wraps NetBeans MDR into an implementation of the ArgoUML model subsystem.

MDR is based in JMI (a standard from the Java Community Process), which is itself based in MOF (a standard from OMG). I was more or less familiar with the ideas behind MOF, but, as always, the devil is in the details and I'm now reading the standard so that I'm not bumping so much on problems.

My lack of knowledge about MOF was made worse by the sources of the several jars from MDR not being included in the ArgoUML repository. Also, unusual for open source projects MDR doesn't make it available as a download - you must checkout the sources with Mercurial (more on this latter) or CVS. Even so, it doesn't contain the sources for jmi.jar and mof.jar. The MOF and JMI standards don't have a zipped javadoc to help when you bump into unexpected problems in the debugger. All this is making the whole exercise harder than needed!

TODO: get my hands on the jmi.jar and mof.jar sources!

Labels: , ,

Wednesday, December 05, 2007

2007-12-05

I'm working now on issue #4923 and this requires some measurements of the time the automated headless tests take to run. Thanks to Linus' efforts in setting up a continuous integration server this isn't very difficult, I will simply take the latests 10 results from the revisions of the JUnit reports summary and compare them with the next 10 results after making my change in the model-mdr implementation.

So, here are the values and the total average before the change:

Before changes, Java 5 tests.
revisionTestsFailuresErrorsSuccess rateTime (s)
rev=445611151099.91%3065.731
rev=444311151099.91%2921.652
rev=443111151099.91%2899.956
rev=440811172099.82%2792.906
rev=439211163099.73%2789.595
rev=4378111600100.00%2866.223
rev=4367111600100.00%2842.319
rev=4355111700100.00%2861.930
rev=4345111700100.00%2887.080
rev=4340110100100.00%2787.845
Average time:2871.524

 

Before changes, Java 6 tests.
revisionTestsFailuresErrorsSuccess rateTime (s)
446211151099.91%2715.539
444811151099.91%2665.635
443711151099.91%2665.288
440011163099.73%2526.352
4383111600100.00%2627.562
4372111600100.00%2596.418
4360111700100.00%2651.676
4350111700100.00%2654.227
4332110100100.00%2514.612
4318110100100.00%2509.551
Average time:2612.686

Tomorrow I'll make tests with ArgoUML running to check if the MDRModelImplementation constructor is called during its execution and if not I'll commit my changes to MDR. Then, it is a matter of waiting 10 days for the veredict of the performance hit.

Update on 2007-12-19: added averages and results after changes.

Afer changes, Java 5 tests.
revisionTestsFailuresErrorsSuccess rateTime (s)
4611111600100.00%2910.440
4594111600100.00%2917.587
4581111600100.00%2905.284
4570111600100.00%2903.603
4559111600100.00%2898.110
4546111600100.00%2909.500
4533111500100.00%2906.624
4521111500100.00%2904.351
4508111500100.00%2939.427
4496111500100.00%2903.854
Average time:2909.88

 

Afer changes, Java 6 tests.
revisionTestsFailuresErrorsSuccess rateTime (s)
4586111600100.00%2665.922
4575111600100.00%2665.115
4564111600100.00%2660.793
4552111600100.00%2677.356
4538111500100.00%2668.918
4526111500100.00%2677.958
4514111500100.00%2699.183
4501111500100.00%2687.672
4488111500100.00%2630.221
447411151099.91%2757.297
Average time:2679.04

So, a ~2.5% performance hit in Java 6 and a ~1.3% hit under Java 5.

Labels:

Monday, November 19, 2007

2007-11-19

While I was testing a patch by Lukasz Gromanowski, found a bug in org.argouml.ui.SettingsDialog. This one is interesting. The contract established by GUISettingsTabInterface is that implementers will be called when the user saves the configurations. But, that wasn't happening for SettingsTabCpp. After some debugging and seeing the bug in my front several times I finally understood.SettingsTabCpp does not extend JPanel and the SettingsDialog was only invoking the callback methods of tabs contained by its component tabs (an object of type JTabbedPane), which were of type GUISettingsTabInterface.

Labels:

Monday, November 12, 2007

My promotion to core developer

I was promoted to core developer by Linus Tolke and I believe in agreement with the other active core developers – Bob Tarling, Michiel van der Wulp and Tom Morris. I'm very happy about this and it will be motivating to start working more often in the core of ArgoUML.

Tom sent a very warm welcome message to the developers mailing list. Specifically for my part he refers two features I'm keen to get involved into, the profiles and the support for parameterized classes and UML templates in general. Well, I was already involved, but, now I'm much more motivated to work directly in the core to advance these two features. These are central for ArgoUML to be a good basis for C++ model driven development, which will continue to be my main focus for the future.

Labels:

Refactoring org.argouml.uml.profile

As stated in issue #4885 and in the dev mailing list thread "org.argouml.uml.profile - success in working from models and improvement proposals", I'm working in refactoring the profile sub-subsystem of ArgoUML so that it is possible for modules to define their own profiles. The main problem to get rid of is that there are singletons in it, such as the ProfileManagerImpl. This class is a singleton and besides some very few exceptions singletons are an abused design pattern. More so in Java projects, since Java doesn't support global variables and, guess what, a singleton is a replacement of the humble global variable, even if it is very clumsy and pernicious.

Before proceeding with my rambling about singletons being bad, let me say that I'm very happy with the work contributed by Marcos Aurélio, one of the developers that joined in the Google Summer of Code 2007. The package org.argouml.uml.profile is congruent, being absent of monster classes and methods, with a nice balance between abstractness and implementation classes and with a very pleasant distribution of responsibilities amongst the classes. Furthermore, so far I haven't found a single defect!

The refactoring will be much more easy than what I will have to do to improve the GeneratorCpp – which is a singleton :-( ... and that bring us back to my rambling...

Singletons are clumsy, because instead of one line of code declaring a variable at global scope1, one line initializing it in some appropriate place of your code and a direct reference from where you want to access it, you now have to define a private constructor, a static accessor method and then, call this method from wherever you need to access it from. It is pernicious because if you wanted to abstract the implementation of the global variable, it won't be possible – every single user object will now refer to the singleton class directly, even if they don't need to, because one of their owners or more closely related objects could have provided it themselves. Another bad effect is that when eventually the application evolves and you would like to have more than one of those objects, or to have a fresh one for another piece of work or for unit testing, you'll have the singleton and the singleton accessing code stopping you from doing it.

Today I found yet another pernicious effect – loss of control of initialization order. In my checked out copy I have made a spike to check if the plan of having the C++ module providing the UML profile for C++ would work based on the recently contributed support provided by Marcos Aurélio. So, in C++ module I have defined a new ModuleInterface implementation that registers a ProfileCpp object in the ProfileManagerImpl instance. Now, guess who is now instantiating indirectly both ProjectManager and ProfileManagerImpl? Yeah, the humble C++ module or better indirectly the ModuleLoader2 is doing it! Worst, while at it, I noticed that the order of initialization of modules by ModuleLoader2 seams to be arbitrary. Because SettingsCpp is also loaded as a module, and it accesses the GUI instance, this will also be initialized not by the Main.main in an explicit way, but, indirectly by the SettingsCpp module.

Isn't this bad?!?

The way to solve this is via explicit initialization of subsystems and subsystems which keep the details for themselves. The best example is the org.argouml.model subsystem. It provides means for explicit initialization and access to it is via static functions of the Model class.

Alas, the Profile subsystem may not need to have several implementations as required from the Model subsystem, so, I won't restrict its implementation so much, such as having only interfaces available for clients, but, I think it will be easier to use and maintain if it looses some of its singletonitis. Check the ideas in issue #4885 and please send some feedback if you have ideas on how to deal with this differently.

1 In modern programming languages, "global variables" aren't normally global anymore, since normally they are contained in a specific package or namespace. A pair of good examples is the java.lang.System in Java and std::cout in C++. These "globals" aren't frown at and the libraries were design by developers above the average for sure, so, why is an accessible variable so bad?!?

Labels: , ,

Wednesday, October 03, 2007

Uff! I finished the GeneratorCpp feature sketch

Previously I stated that I wanted to refactor the GeneratorCpp class. A good way would be to use some techniques I learned from Working Effectively with Legacy Code by Michael Feathers (2004), particularly, his feature sketches. Well, I worked hard to put the whole class under scrutiny and the result is very depressing (see bellow). Very different from the ones Michael has in his blog post.

Note that I used rectangles for functions and ellipses for variables. This is exactly the opposite from the convention Michael uses in his feature diagrams. My diagram is made in OpenOffice.org Draw and I had to use a A2 page so that everything could fit in a single one. I like this approach because, although it is harder to draw in the first place, when you have it you may drag things, group and ungroup, etc. This is important to identify and make clear the clusters that may be extracted from the monster with less pain. I put it in argouml-cpp doc directory, so, if you get curious, try it out.

GeneratorCpp feature sketch.

Picture 6 – Feature sketch of GeneratorCpp. Rectangles are for methods, ellipses for variables, blue for non-static and red for static. The yellow rectangle to the left denotes a cluster of methods that could easily be extracted, the yellow rectangle in the bottom center contains the methods I wanted to extract related to Associations, but, which are hard to extract from the class.

Don't misunderstand me, feature sketches are one more good thing that I will add to my tool box, but, they must be complemented with other things Michael talks about in his book, such as identifying responsibilities. This is more important if you are dealing with a monster class such as this. Nevertheless, the feature sketch enabled me to see the clusters in the class. Even more important, while I was doing it I reviewed the code in a way I never did before – actually this was the first time I looked at it from start to end. There are variables that keep pure processing state or context (generatorPass and actualNamespace), others that are mostly read and that keep configuration editable by users (e.g., indent, lfBeforeCurly, verboseDocs) and others that store processing results, which are used to generate code that deals with dependencies (e.g., includeCls, predeclCls, systemInc, extInc and localInc).

As a bonus I discovered some non-documented features of the generator that might come handy to solve some issues that are mounting up in the issues list. For instance, issue #22: provide a tagged value for user includes with angle brackets is handled by method addUserHeaders if the user places either in source_incl or header_incl tagged values the header name within angle brackets.

Labels: , , , , ,

Monday, September 17, 2007

Refactoring GeneratorCpp

The GeneratorCpp class contains almost all the code that is used to provide C++ code generation in the C++ module. The file is over 2900 lines! It will start to get even worst since the support for C++ notation will reuse it, so, we might need to add even more code to it.

It is a monster class. I know that the responsibility of that class is generating C++ code from a UML model. So, a distant observer could say that this respects the Single Responsibility Principle (SRP). But that would be like saying that log4j could be implemented in a single class because it is software that has the responsibility to support logging.

I read the book Working Effectively with Legacy Code by Michael Feathers (2004) and there he describes very useful techniques on how to refactor such code. I'll apply the feature sketch in my current task, where I have to fix a bug related to the way the generator deals with associations, in order to extract some of the methods into a separate class. Then, I'm planning to test if the result is friendly from the perspective of client code from the notation package.

But, before starting to apply a specific technique, I must reason about the responsibilities of the C++ generator. Follows a – probably incomplete – list:

  • conversion of the UML constructs into C++ equivalent code
    • operations and methods
    • attributes
    • packages
    • associations – includes aggregations, compositions, generalizations
  • documentation
  • tagged values
  • coordination of the code generation for a class
  • C++ notation support
  • indentation
  • generation of header files
  • generation of source files

Many of these responsibilities interact and depend on each other. Because it is all contained in one class, all is getting mangled into a huge mess, although I can understand that it seams easier this way, I also think that this is simply at the surface.

My idea for now is too separate the aggregation and composition parts from the rest. But, it is difficult because there are parts in the methods that deal with these two things that also deal with indentation and documentation. This is messy because when I'm generating code I want it to use the indentation options selected by the user, but, when I'm providing C++ notation support I want them not to indent things and not to insert C++ headers into the required headers list. So, I'll look at the methods that support these things and use the feature sketch technique to understand how to isolate some parts.

Labels: , , , , ,