As an ArgoUML contributor I'm going to blog my activities here, so that they may draw interest by other developers or help other developers when doing tasks similar to what I've done. AND(!) the grand vision that makes an Argonaut what he is, TO THRIVE IN THE BIG DANGEROUS WORLD, TAKING THE Argo TO A GOOD SHORE ;-))

Monday, September 17, 2007

Refactoring GeneratorCpp

The GeneratorCpp class contains almost all the code that is used to provide C++ code generation in the C++ module. The file is over 2900 lines! It will start to get even worst since the support for C++ notation will reuse it, so, we might need to add even more code to it.

It is a monster class. I know that the responsibility of that class is generating C++ code from a UML model. So, a distant observer could say that this respects the Single Responsibility Principle (SRP). But that would be like saying that log4j could be implemented in a single class because it is software that has the responsibility to support logging.

I read the book Working Effectively with Legacy Code by Michael Feathers (2004) and there he describes very useful techniques on how to refactor such code. I'll apply the feature sketch in my current task, where I have to fix a bug related to the way the generator deals with associations, in order to extract some of the methods into a separate class. Then, I'm planning to test if the result is friendly from the perspective of client code from the notation package.

But, before starting to apply a specific technique, I must reason about the responsibilities of the C++ generator. Follows a – probably incomplete – list:

  • conversion of the UML constructs into C++ equivalent code
    • operations and methods
    • attributes
    • packages
    • associations – includes aggregations, compositions, generalizations
  • documentation
  • tagged values
  • coordination of the code generation for a class
  • C++ notation support
  • indentation
  • generation of header files
  • generation of source files

Many of these responsibilities interact and depend on each other. Because it is all contained in one class, all is getting mangled into a huge mess, although I can understand that it seams easier this way, I also think that this is simply at the surface.

My idea for now is too separate the aggregation and composition parts from the rest. But, it is difficult because there are parts in the methods that deal with these two things that also deal with indentation and documentation. This is messy because when I'm generating code I want it to use the indentation options selected by the user, but, when I'm providing C++ notation support I want them not to indent things and not to insert C++ headers into the required headers list. So, I'll look at the methods that support these things and use the feature sketch technique to understand how to isolate some parts.

Reader Shared items