Tech Ed Session ARC305: Code Generation: Architecting a New Kind of Reuse (Scott Hanselman, Corillian)
published: Thu, 16-Jun-2005 | updated: Thu, 16-Jun-2005
This was a controversial, hit-them-between-the-eyes, rapid-fire talk from a well-regarded developer in the .NET world. Scott Hanselman is Chief Architect at Corillian Corporation, a company that writes online banking software and sells it to banks, the epitome of a vertical market. It's successful: apparently it has 25% of the market. However, they have a big issue, not faced by many software companies: every license they sell requires extensive customization (Bank X doesn't want their site to look like Bank Y's).
He started off the session with a pretty bold, bald statement: "A Word document has no teeth." By this he meant that a design document written in Word does not break builds, does not fail unit tests, does not cause assertions to fail, etc. Unlike John Donne's Man, it exists as an island.
In other words, the design document, when written in Word, provides a fundamental disconnect between designer/architect and developer. The designer writes down what he visualizes in a design, and it's left to the developer to try and interpret the designer's vision as code. Unfortunately, there's no way for the developer to tie his code to phrase/sentences/paragraphs in the design document, to ensure he's accurately captured everything the designer wanted, to make certain that when the document changes that he's notified of the changes, and so on.
Hanselman's assertion (actually it's more than that: he was describing how Corillian write their software) is to define the design in a DSL (Domain Specific Language). The textual expression of the DSL can be anything you like but it makes sense for it to be XML, just because of the plethora of tools available that can process XML. Corillian also write an XSD (XML schema definition) to provide validation services for their design document.
However it doesn't end there. From this design XML document, Corillian use XSL scripts, code generators (they use CodeSmith), and other hand-written programs to
- generate the domain object model (DOM) code
- generate the NUnit tests to test the DOM
- generate the HTML help files describing the DOM for Visual Studio
- generate the DOM's Intellisense XML for Visual Studio
- generate the NAnt build scripts (no longer do new files or tests get forgotten)
- generate WordML documents that describe the design
Yes, you read that last one correctly, they can generate Word documents for the design.
So, Corillian reuse the work done and the information in their design document in lots of different ways. It's not code reuse really, it's more information reuse. If you don't like a description of a domain object, just go ahead and alter it in the design XML document and regenerate everything. If you need a new event for a domain object, write the relevant section in the XML, adding a description, hint, help text and the like and regenerate.
On a more practical front, he mentioned that their product build is not just a compilation-only process, but that it generates all artifacts prior to the compilation step. That way, all compiled artifacts are always up-to-date.
He also went into a little bit of detail about their philosophy and why they went this route. It seems that in the retail banking market there is a middle tier system that nearly everyone uses. It was written in the early nineties in C++ using COM, and was designed to interface a single do-everything routine to which you pass a specially formatted string. A bit like running a complex command-line application to which you have to pass all the run information as arguments on the command line.
To survive they had to encapsulate the various calls to this do- everything routine in a class model ("to add a new account you call the routine with "blahblah" as the string parameter", "to transfer data between account X and account Y, you use "foofoo", and so on, depressingly on). Once they started writing the class model, they found that they were repeating code patterns all over the place, not code per se, but styles of writing code to do something. They experimented with code gen, and it grew from there on.
The other thing he talked about was developing web controls that are very specific to your domain, and that have good design behavior. In other words, instead of developing a general data grid that you could reuse, you would instead develop a web control that displayed transactions in an account. This control could be plugged into Visual Studio where the developer could drop it on a page and set the various properties. He said, essentially, that they spent the time to do it right the first time for the first web control, refactoring the code as necessary to produce something that could easily be generated, and then converted it into a template for the code-gen tool.
(Sidebar: apparently writing a web control requires writing several different inter-related classes. Inheritance is not an option -- not having done this before I don't know all the issues -- so you're left with the issue of writing similar code over and over again. This is obviously an ideal candidate for code gen.)
In fact, that same development methodology was used in other places as well. If you need to write a class that encapsulates some behavior for lots of different types of domain objects, write it by hand for the first domain object and refactor/test the heck out of it. Then create a template and generate the code ad nauseam for your different domain objects.
Although the session was thought-provoking, I longed to see some of the code and surf the entire code generation framework to understand the short-cuts they'd taken that he wasn't talking about. In my own forays into code gen, there always seems to be a point at which you can't gen the code anymore because of some specialized behavior, etc. In one project I worked on, we even had to abandon code generating after a certain point. I would have loved to have had some time alone in their codebase, doing some research.