Design and Code Reviews
published: Thu, 26-Feb-2004 | updated: Sat, 6-Aug-2016
One thing that has been on my mind recently is the subject of design and code reviews. (Warning: this is a long one, folks.)
This was triggered by several independent things happening at once, almost as if they were related.
The first was that MSDN Magazine published an article that I reviewed whilst still at Microsoft. The subject was a discussion of using the standard I/O streams in console apps in C#, and how you can use them with pipes from the command prompt. The author, Michael Brook, went in a lot of satisfying detail about how to do all this, and the number of issues I could find were small, but not insignificant. Some of my comments he took to heart, and some he ignored (to the article's detriment, in my view).
The second was that MSDN (the online version, not the magazine this time) published another article in the series on data structures in C# and .NET. I've already commented on the second in the series, showing that some minor but important theoretical problems had crept past both the author and the reviewer. Well the third article in the series has some design and code problems to which I need to devote another article. (And now the fourth article has been posted, with more issues. Will I be able to keep up?)
The third was a contract I've been working on for a while. The work needed is some enhancements to an older Delphi 3 (scream!) application, which was recently purchased by my client.
Man, oh, man, this code is dreadful. Let me count the ways, and in doing so draw some conclusions about design and code reviews, as well as design and code patterns.
Global variables are evil
Now, I'm sure I don't need to discuss why global variables are bad, but just in case: having a global variable means that you are giving up the ability to ascertain when and where such a variable is being initialized, is being modified, is being read. You may know all about it right now as you write the program, but will you remember in six months time? Will the guy who has to maintain it (Hello! That's me!) understand the semantics of using this global variable?
An example from my early programming days. In those far off times I used to write apps in RPG II for the IBM System/34 and RPG III for the IBM System/38. Every variable in an RPG program is global (it may be different now, I haven't kept up). There were also a set of 99 boolean values called flags and having the identifiers 01 to 99. You got into the habit of commenting every routine's effect on variables and flags. Anyway, there was one program I worked on for a small bank that would fail in peculiar ways in certain conditions. It turned out to be a flag being reused in a routine being called from another.
Anyway, here's a global variable from this source code I'm modifying:
var ok : boolean;
Yessss! I kid you not. No indication with the name that this is a global variable, no subtle warning that you may screw something up if you use or change it. Since the same name is used for local boolean variables throughout the source code (I kid you not, again), it becomes somewhere between hard and difficult to identify where it's used and when it's changed.
Mixing data access code and presentation code is bad
More of an architectural issue this time, I suppose. Throughout this application's source code are SQL statements being constructed and used. It is nigh on impossible to create any unit tests for this kind of code. This source code goes even further: it deliberately adds methods to form classes that just do database work. Let's see what this means: in order to use this method to get at the data you want, you have to create an instance of the form. Thank you very much, but doubleplus ungood.
Actually this points to one of the more difficult design problems we face as database application developers: how to divorce the data access code from the presentation code (by presentation code I mean code that presents the data to the user on the screen, in a browser, on a piece of paper, on a mobile phone's display). Achieving this separation provides a whole host of benefits: better testability and quality (you can test the data access code in an automated fashion), greater ease in supporting other database engines (say, you provide an Access version for single users, but a higher-priced SQL Server version for corporate customers), more ability to optimize database access (the SQL statements will all be in one place in the code not spread throughout the application), improved code reuse (avoiding those "I need this data in this form, so I'll just code another SQL statement to get it, without looking elsewhere in other forms' code to see if it exists already), and so on.
try..finally is your friend
We all know that try..finally helps you write more robust programs that don't suffer from memory and resource leaks. Looking at this program I'm maintaining I see (some identifiers changed to protect the guilty):
procedure TMainForm.ShowBlahFormClick(Sender: TObject); begin with TBlahForm.create(self) do try showmodal; finally free; end; end;
which is pretty good, nice and safe, and:
procedure TMainForm.ShowOtherBlahClick(Sender: TObject); begin OtherBlahForm := TOtherBlahForm.create(self); OtherBlahForm.showmodal; OtherBlahForm.free; end;
which is frankly appalling. (And both of these snippets are from the
same source file: let's all scream in terror together!) Let's see why.
First, it doesn't use a try..finally block. Second, it uses a global
variable called OtherBlahForm
. Admittedly, that's partly due to the
annoying habit of Delphi to auto-create a global variable of the right
type when you create a new form. My recommendation is to delete this
auto-created variable as soon as you can.
The final problem with this second snippet of code is the usual one
about global variables. Suppose you're in a different source file and
you reference OtherBlahForm
. Is is initialized so you can use it as
is? will your app crash if you do so? can you create the reference
again? will there be a resource and memory leak if you do so?
A corollary of this point is that the code reviewer (if not the developer) must be on the alert to resource and memory allocations, and track where they're freed. Is an instance being created and used locally? Then there must be a try..finally block to ensure that it's freed. (This, in essence, is why the first code snippet in this section is safe code, and the second is not.) If the resource is freed far away from where it's allocated, there's little the code reviewer can do, we will have to rely on some kind of heap and resource monitoring program to check for allocations without corresponding frees. Here's an example that should raise red flags in the reviewer's mind:
assignfile(sourcefile,source1); reset(sourcefile); ... lot's of work that could raise exceptions ... closefile(sourcefile);
Functions and procedures should have at most four parameters, maybe five at a pinch
Consider that, in 32-bit Delphi, the first three parameters are passed in registers (ignoring any obvious exceptions to the rule) so ideally you should have three parameters or less. One routine in this program has 26 parameters, most of them strings. Give me a break.
Functions and procedures should be small
Sigh. A difficult one this. How small is small? I was always told a page of printout in the days of fanfold, green-and-white paper. Nowadays the rule of thumb tends to be a screen's worth; in other words, of a size such that the reader doesn't have to scroll back and forth. For the past 18 months, I tend to think of it in terms of complexity: if the routine's cyclomatic complexity (CC, a way of counting the number of paths through the routine) is below 8, I'd feel OK about the size of the routine. (In my last contract, the most complex routine had a CC value of 4.) However, that's only feasible if you have a program that can analyze your source code and calculate the CC for you, so stick to a screen's worth.
Another measure (that's directly related to CC, if you think about it) is the depth of indentation. Don't indent too deep (maybe four levels, max). Consider splitting up a routine is you find that its code is resolutely marching over to the right.
The worst I've seen so far in this particular set of source files is a routine that's 23 pages long and that indents about 8 levels at the worst. That's ridiculously long and is totally incomprehensible and unmaintainable. (But I shall have to have a go. Sigh.)
Premature optimization is a waste of time
How can I put this? Don't optimize until you have a program to optimize, otherwise you could be optimizing something that makes little to no difference to the running of your program. Always code simply and directly. Don't say to yourself "I'll inline this so that it's faster", say to yourself, "I'll extract this code into another method, to make the current routine easier to understand."
Let's put it this way. At some point in your development cycle, you will have some approximation to the required application. Users could use it if they wished. It doesn't have all the bells and whistles, certainly, but it does do something. At this point, you should profile the application for one or more real-life scenarios. For example, if your users will only use 1Kb XML files, don't profile parsing a 1Mb XML file. There may come a time when 1Mb XML files are de rigueur, but you can do the work to make it faster for that situation then. Doing it now is a waste of your time, since 1Mb XML files may never happen for your application, or someone may write an XML parser in hardware, or whatever.
Set yourself some goals at this point, based on your profiling results: "Initial program load to the first screen displayed should be less than 3 seconds", "processing 1000 customer records should take less than 20 seconds", "clicking a button to showing the next window should be less than a second", and so on. Solving each of these scenarios involves different trade-offs, different ways to achieve the goal. It is pointless to say that each and every possible routine in the code must run as fast as possible and no slower. That's equivalent to saying, we will fix each and every bug, no matter how small, avoidable, or rare it is, in our product. You can't: your company would die before you shipped.
Every week (at least) you should be profiling to check that you don't fail the important scenarios you've identified. If you do, investigate. Locate the new code that causing the failure, fix it, alter it, move it so that it runs elsewhere, and so on.
I'm lucky with this application: the author made no attempt at any
optimization whatsoever. Phew! For example, all dates are passed as
strings. Obviously, the TDateTime
type is for wimps. But I'm afraid it
does lead to some bizarre behavior. For example, there's a routine to
calculate today's date next year. If you trace through the code that
gets executed, it does this: calls Date
to get today's date; converts
it into a string representation; extracts the year substring; converts
it into an integer; adds one; converts the new value back to a string;
concatenates the day and the month from the original string to this
new year string to give the answer. (Incidentally, this routine has a
bug: it fails for the 29th February in a leap year. The date a year
away is either 28th February or the 1st March.)
Code duplication is a hanging offense
You should be ruthless in removing code duplication. It's one of the basic code smells and one of the easiest ways to drive the maintenance developer down the road into a murderous rage. Consider this question: how much is spent in original development of an application versus in maintaining and enhancing that application? Robert Glass in Facts and Fallacies in Software Engineering states that "maintenance typically consumes 40 to 80 percent of software costs" and that "enhancement is responsible for 60 percent of all maintenance costs." He further points out that 30% of maintenance is spent in understanding the code. So be nice to your maintainer, he might know where you live.
One of the source code files has five routines that do exactly the same thing. I can see how it happened: they're all event handlers, and it's easier to fill in the blanks in the newly created method body than to think and make a call to another routine that does the required work. Imagine that something else had to be done inside this duplicated code. It's extremely likely that the maintenance guy, unless he'd spent a long time understanding the code, would miss one or more of these additional routines to change.
Delphi may be case-insensitive, but humans are not
Take a look at those preceding code snippets again. Notice that, apart
from the changed names TMainForm
, ShowBlahFormClick
, etc, the
identifiers are all lower-case? Hello? It's Create
and it's ShowModal
and it's Free
, for heaven's sake. We can argue about whether
procedure
, begin
, end
, and self
should have initial capitals, but not
method and routine names that are declared in source code we can look
at.
The code I'm modifying is pretty much all lower-case. It's as if the
shift key hadn't been invented when the app was written. It's
ridiculous: although the compiler doesn't give a flying, we human
readers of the code do. Is it easier to read otherblahform
or
OtherBlahForm
? Case closed. (And no, that doesn't mean we should use
the underscore to separate words.)
if
statement's begin
to be on the same line as the
if
, then do it, and stick to it. If you want it on the next line, at
the same indentation level, do it. At another indentation level? Fine.
Do it and stick to it. Just don't change your style half way through
a source file, like this code does.
Indentation and nicely spaced out code makes a difference to us, the human readers, not to the compiler. Well, OK, it may make your compilation a couple of hundredths of a second slower, but bite the bullet. If you stumble over your own code in six months time because the indentation and spacing is alien to you, you'll lose a lot more than the couple of hundredths of a second in compiling it.
Delphi doesn't care what identifiers are called, but humans do
OK, here's an example, from my current bête
noire. There's a function called empty
and there's another called
empty2
. To use texting parlance, WTF? Here's another example:
showmonth
and showmonth2
. How about decmonth
and newdecmonth
? Super
clear, right? Glad you agree.
Let's think about a function called empty
for a moment. First of all,
according to my previous rule, this should be Empty
. Now, not knowing
anything else about the function at all, let's try and guess some
facts about it. To me the name implies a verb or a command: hey, empty
this! If I now tell you that the 'this' is a string passed by value,
what can we infer? It sets the string to an empty string? But that's
silly, since MyString := '';
does it a lot simpler. Another hint: the
function returns a boolean. Ah ha, maybe the function returns whether
the string is empty or not. But (MyString = '')
does that so much more
succinctly, or even (length(MyString) = 0)
. No, my dear readers, this
function actually returns whether the string consists only of
whitespace characters, where whitespace not only means the space
character, but also null, and some control characters. So why isn't
the function just called IsStringWhitespace? It's a lot clearer to us human
readers. (To the compiler, it's a unique identifier that happens to be
a few characters longer.)
Please name your identifiers so that the code can be read as if it is
English. If you find that you're adding the characters And, or
ExceptWhen, or something similar, take it as a hint that perhaps your
routine is doing too much (it should only be doing one thing). Split up
the routine, and name its parts something significant. For example,
for functions returning a boolean, consider using Is, Was, Check,
Verify verbs to indicate that the function is returning a boolean
value. Consider using command type verbs for procedures that modify
something. Consider the verbs Get or Retrieve for functions that
return something important. If you do this you'll be able to gain
something from the semantic meanings of the identifiers you use. If
you do so, you'll have a pretty good idea of what these routines do:
IsStringWhitespace (a function returning a whether a string
is blank), GetUserName
(return the user name), ClearAddress
(set the
address structure or object to zeros, or nulls), ConvertDateToString
(sometimes abbreviated to DateToString
, to convert a date
to its string representation).
If you pass back a valid/invalid indicator, check it
There's a function in this source code that converts a string value to
a longint value and returns not only the value converted but also an
indication of whether the conversion succeeded (a bit like later
Delphi's TryStrToInt
). The function result is the longint value, the
success indicator is a var parameter (which is called ok
: see above
about a global variable called ok
). Here's a cleaned up and renamed
version of the function header:
function StrToInt(aValue : string; var aIsValid : boolean) : longint;
Now, we can argue about whether it's better to have the function
result as the boolean, but it behooves the user of this function to
actually check whether the conversion succeeded. Needless to
say, it's hardly ever checked. Why? Well I think the main reason is
because the function returns the converted integer value. It's just
too damned easy to blow off the "did it succeed or not" indicator when you
can just use the function in some expression. If it were defined as
TryStrToInt
is defined, we'd have to take notice of the
success flag, since it would be impossible to use in an expression.
You must refactor to simplify the code
A lot of the above
points can be boiled down to this important point. If we took care of
code smells and refactoring so that they disappear, we'd have better
code. So, let's take an unaltered routine from this program and
refactor it to death. I'll take the routine I just mentioned (it is
obviously redundant since Delphi's SysUtils
provides a better one, and
it certainly doesn't contain any intellectual property). Here it is,
unaltered.
function str_to_int(passtr : string;var ok : boolean):longint; var spare : longint; code : integer; begin passtr := remove(' ',passtr); if passtr = '' then passtr := '0'; val(passtr,spare,code); if code = 0 then ok := true else ok := false; if code = 0 then result := spare else result := 0; end;
Sorry, I should have warned you first. Let's clean up the indentation and spacing first, together with some capitalization, to the style I prefer.
function ConvertStrToInt(Passtr : string; var Ok : boolean) : longint; var Spare : longint; Code : integer; begin Passtr := Remove(' ', Passtr); if (Passtr = '') then Passtr := '0'; Val(Passtr, Spare, Code); if (Code = 0) then Ok := true else Ok := false; if (Code = 0) then Result := Spare else Result := 0; end;
The first executable line calls a routine to remove a certain
character (here, the space character) from the string. Although I
agree we should remove leading and trailing spaces, I don't agree that
we should be removing spaces in between the non-space characters. In
other words, " 1 " is a valid number to me, but "1 2" is not. Let's
assume that there's a Trim
function available to us that trims leading
and trailing blanks from a string.
Now take a look at the next couple of lines. Here we're checking to see if the passed in string is now completely empty, and, if so, setting it to '0'. And then we convert that '0' to an integer. Why bother? Surely we can just say, if the string is empty, the routine successfully converts it to a result of zero.
function ConvertStrToInt(Passtr : string; var Ok : boolean) : longint; var Spare : longint; Code : integer; begin Passtr := Trim(Passtr); if (Passtr = '') then begin Result := 0; Ok := true; end else begin Val(Passtr, Spare, Code); if (Code = 0) then Ok := true else Ok := false; if (Code = 0) then Result := Spare else Result := 0; end; end;
Next, look at the usage of the variable Spare
. It's just standing in
for the pre-defined variable called Result
, so why don't we just use
Result
instead, and get rid of Spare
. Next we have two if statements
with exactly the same condition. Big slap upside the head here. Let's
clean that up straightaway.
function ConvertStrToInt(Passtr : string; var Ok : boolean) : longint; var Code : integer; begin Passtr := Trim(Passtr); if (Passtr = '') then begin Result := 0; Ok := true; end else begin Val(Passtr, Result, Code); if (Code = 0) then Ok := true else begin Ok := false; Result := 0; end; end; end;
Is that last statement that sets Result
to 0 really necessary? Looking
at the help for Val
, we can't say (the help gives no hint as to
whether the value is zeroed or whether it contains the number parsed
until the error was found). So, we'll let it stay.
The final refactorings I'll do will be some renaming and reorganizing:
function ConvertStrToInt(aValue : string; var aIsValid : boolean) : longint; var ErrorCode : integer; begin aIsValid := true; // assume string is valid aValue := Trim(aValue); if (aValue = '') then Result := 0; else begin Val(aValue, Result, ErrorCode); if (ErrorCode <> 0) then begin aIsValid := false; Result := 0; end; end; end;
Of course, we should have some tests set up with DUnit to verify the correct working of the code. I would suggest testing '' to succeed with result 0; ' ' to succeed with result 0; '1 2' to fail with result 0; 'a' to fail with result 0; and '123' to succeed with result 123.
Conclusion
Well, after all that, I hope you found this discussion helpful. Although most of the article was geared to developing in Delphi, there are lots of nuggets that apply in other languages, and on other platforms. One thing you can do now is to be your own code reviewer. Take a look at your code critically. Code duplication? Remove it. Nasty identifier name? Change it. A routine with pages of code? Split it up. Go and make your code reviewer happy.