Procedural Thinking
published: Thu, 19-Jan-2006 | updated: Thu, 19-Jan-2006
Borland have released the latest version of Delphi, known as Delphi 2006. Unlike the two previous releases (Delphi 8—the first .NET version—and Delphi 2005), the news on the street indicate that this one is very good and it has been extremely well received. Like Delphi 2005, it covers 32-bit Windows development as well as .NET development, but unlike Delphi 2005, it's very stable. All kudos must go to Danny Thorpe (who's now moved to Google) and Allen Bauer (who's now Chief Scientist at Borland) as well as the rest of the Delphi and VCL team on such a good product.
My topic today is not about Delphi 2006 per se (and given my past rants about D8 and D2005, my Delphi readers will heave a huge sigh of relief), but about object-oriented programming languages and how their depth of object-orientation may affect how you think about OOD and OOP (object-oriented design and programming). Yes, I'm continuing the theme from my previous post, so cry me a river.
You see, because of historical reasons, Delphi (and C++ for that matter) are a weird mix of the procedural and the OO. Java and C# are not, and I don't know enough about VB.NET to even begin to know.
In the former languages, you can just write a routine. It doesn't have to belong to a class as a method and be able to take advantage of encapsulation or polymorphism or anything like that. It can just exist on its own. Pass some stuff in and let it do some work on that stuff.
The code I showed last time is such an example. It exists on its own, not attached to any meaningful class that might offer some other semantics to the user of the routine. To many programmers, myself included, this was the way of doing things in the old days. I grew up, programming-wise, on RPG II/III and Turbo Pascal and this was how you wrote subroutines. You passed in some parameters, and, by golly, the routine would do some nice work using them and possibly return a nice value.
And then Borland produced version 5.5 of Turbo Pascal, known as Turbo Pascal with Objects. There were these weird code constructions with new terminology like encapsulation, polymorphism and inheritance. It took us Pascal developers a while to understand this strange new world and it was helped by a slim TP5.5 manual that was written by Jeff Duntemann. I also remember receiving Object Professional 1.0 from TurboPower (this was in the days before I worked for them) and browsing the source code and trying to understand it and why Kim and the guys had constructed their object model (another new term!) the way they had. Writing objects yourself was difficult, it seemed like you had to warp your mind into another form.
Fast forward 16 or so years.
Over the past couple of decades, a lot has been discovered, learnt, and written about OOD and OOP. Principles have been discovered and polished (like the Single Responsibility Principle or the two Design Patterns principles I mentioned briefly last time). New notions in the OO languages have been added (think of interfaces, mixins, and the like), all to aid in the encapsulation of data and the behavior hiding of our designs. New design concepts like Contract Programming, Design Patterns, and Domain Driven Design have invaded our thoughts. New methodologies like XP and TDD have helped us produce better class models and implementations.
And yet the procedural mindset is still with us. So for the code I talked about last time, what is its purpose? From what requirement did it arise? I've never written a program that just read a "text" "file" into a "string". (I suppose I could have written an experiment at one time that did that, but I certainly don't remember doing so.) What is the text that is being read? An ini file, XML, HTML, a data file generated by a mainframe, text for an email, a stacktrace dumped by an application, a Delphi source file? Is the file in the local file system? On a remote server, accessed by a UNC path or some kind of URI? What's the string that's produced by the routine going to be used for? Splitting into delimited lines? Parsed to produce an abstract symbol tree (AST)? Read by a DOM engine? Prefixed and suffixed with more characters and sent off to a mail server? Analyzed with a bunch of regexes?
These days it is much better is to approach the problem this routine
is trying to solve from another angle. What domain are we trying to
model with our code? Suppose we were writing a compiler for Pascal
code. The compiler should take a sequence of .pas
files,
parse them, construct an AST, and once that's verified, translate the
AST into some kind of executable code and produce a program file.
So I can see the need for a compiler object to coordinate the workflow. A lexer object would be told to read a Pascal project, break it up into a series of tokens (some of those tokens would instruct the lexer to read other included source files, possibly). The tokens themselves would also be objects. The parser object would take a stream of token objects and construct an AST, another object. The compiler would create a translator object, give it the AST and be told to produce executable code. Finally this stream of executable code would be gathered and written to disk by yet another object.
Maybe in the lexer there would be a need for a source file object but who knows? I don't: I'd have to discover it as I wrote the code. How would I read the text in the source files? I don't know: again I'd have to discover it. Maybe line by line would be fine. Maybe reading the file through a buffered stream would suffice. I doubt I'd ever need to backtrack through the file, in other words the file would be read from start to finish sequentially. So there's possibly no need to read the entire file in one gulp.
And I know that David West would castigate me for even considering a compiler object that "managed" all the other objects.
But nevertheless, no matter what, I wouldn't write or use the routine presented last time. I would be guided towards a domain model through TDD and the domain might presumably contain such objects as lexer, source file, AST, compiler, translator, executable code stream, and so on.
And that's a problem with languages like Delphi and C++: because they allow procedures and routines, people will use them. They'll take the shortcut. I know: I've done it. The most intricate code I ever wrote was the compression and decompression functionality for TurboPower's Abbrevia. I looked at it the other day: I wouldn't offer me a job based on it these days. There's some attempt at defining an object model (and those objects that are there are nicely encapsulated, etc), but otherwise it's a procedural mess. I shudder.
Using languages like Java and C#, you are forced into thinking "object"ively. The restrictions in the language means that you have to think in terms of domain models and domain objects and their interactions. You could still use class methods, certainly, but using them, in my eyes, just feels awkward (and I wish FxCop didn't flag those methods that could be static).
And, repeating myself yet again, doing good OOD and OOP is hard. (Why do you think I've not continued my series on the thread pool yet?) The best models seem to involve some kind of iterative process to get at the underlying design. The best models are highly decoupled, involving simple classes with simple methods that don't have dependencies. No wonder people take shortcuts. No wonder some class models seem to involve a lot of encapsulating data and not much encapsulating of behavior (violating the Tell, Don't Ask principle). No wonder there are nasty looking inheritance models with developers asking questions like "I want to call by grandfather's version of this virtual method and not my parent's".
No wonder there's so much procedural thinking.