As an ArgoUML contributor I'm going to blog my activities here, so that they may draw interest by other developers or help other developers when doing tasks similar to what I've done. AND(!) the grand vision that makes an Argonaut what he is, TO THRIVE IN THE BIG DANGEROUS WORLD, TAKING THE Argo TO A GOOD SHORE ;-))

Tuesday, June 13, 2006

Design of htmled and improvement opportunities

So, how complicated do you think htmled might be? This simple 1000 liner program does very simple content extraction and transformation (see htmled, the handbook editor blog and publisher and htmled revisited – Handbook entries to blog posts automated). It doesn't leave the HTML format, no concurrency, no database and no distribution. So, if you're a Python developer you might simply look at the code.

But, as an ArgoUML developer and user, prior to start doing TDD on htmled, I made some simple diagrams which I kept up-to-date for documentation purposes. The following UML class diagram shows the basic data structure, with correspoding classes existing in htmled.py. The HbFile represents a Handbook file, using HbFileParser to parse the file into HbDailyEntries which by thenselves are composed of HbSubjectEntries. I normally load htmled in the Python shell, create HbFile(s) and then use a PostExtractor to extract Posts from the handbook files.

htmled classes

Picture 4 – htmled main classes diagram.

The complexity lies a bit in the PostExtractor and very much in the HbFileParser and its associated classes. Specifically, for parsing a handbook file I used the HTMLParser module contained in the Python standard library and the State design pattern I read about in Robert C. Martin's Agile Software Development book. For this you may check the UML state diagram I draw when trying to model the Finite State Machine for handbook file parsing – check the HbFileParsing class in htmled.

htmled Handbook File Parsing Finite State Machine

Picture 5 – htmled Handbook File Parsing Finite State Machine.

The use of the state pattern and HTMLParser for parsing the handbook files is definitelly better than a "reinvent the wheel" approach, but, I think it isn't the most appropriate way of solving this problem. I think that I should have tried to use ANTLR for parsing, creating an AST and then extracting the required info from there, e.g., DailyEntries. If I revisit this project again due to ambitions related to imroving its functionalities I might very well do just this as a warm-up!

Reader Shared items

Followers