julian m bucknall >> Review: Ready To Run Delphi 3.0 Algorithms

Review: Ready To Run Delphi 3.0 Algorithms

published: Wed, 18-Jun-2003 | updated: Sat, 20-Aug-2016

Ready-to-Run Delphi 3.0 Algorithms by Rod Stephens (John Wiley & Sons) $39.99

(Notes: this book was written some time ago—in August 1998—before I started on my own book. Consequently several things have changed: the book's web site has disappeared, being amalgamated into Stephens' other Visual Basic web site, and McKinzey-White no longer exists. Also I warn you, this review is long.)

This book is a disappointment. I've been telling everybody who would listen (even those who wouldn't) that it was time someone wrote an algorithms book that targets Delphi. I, myself, write a monthly algorithms column for The Delphi Magazine and I've seriously thought about writing such a book. So imagine my excitement when I saw this book at McKinzey-White here in Colorado Springs. I bought it straight away—if the book's a success, it may prompt other publishers to do the same, so every little I can do the better. And someone may give me a 5 figure advance to take 6 months off and write it (yeah, sure, dream on, sunshine). I mean, just consider all the other Delphi books out there, some better than others, all dealing with the IDE, component writing, the Internet, what have you. At last, something new.

I started reading it that evening. First though I jumped onto the Internet: there was a URL on the back cover, "Visit the author's comprehensive Delphi site at: www.delphi-helper.com". Cool, I thought, he's serious about this book, he's probably got a bunch of errata on a page (like Sedgewick and Knuth have for their books—they recognize that errors creep in and its best to acknowledge them and list fixes). No such luck: the page you get has as title "VB Helper." This is where I learned that the book was rewritten/converted from his VB book "Ready-to-Run Visual Basic Algorithms." And no errata. Oh well.

The Introduction was quickly read (weird, no dedication to his wife or cat) and I was onto Chapter 1, Fundamental Concepts. Here Stephens introduces what he means by algorithms and talks about efficiency and the Big-O notation. The tone is dry but readable. The minor bits of code seem spread out—he uses 4 space indentation. Then I read the function LocateItem on page 8. It's got a bug, a subtle one, but a bug nevertheless. Stephens doesn't know that Delphi's documentation specifically states that the loop counter's value after the end of a for loop is undefined. He assumes it's one more than the final value in the for statement. Now I remember when Borland changed the behavior of the for loop between Delphi 1 and 2, it broke people's applications because they were assuming the loop counter had a particular value.

Oh well, minor point. Chapter 2: Lists. He starts off with arrays and shows how to define them so that you can easily resize them. For some reason he uses a 1-based array, rather than the usual 0-based array. All right, never mind. His code is procedural based, rather than classes, which is fine for discussing the algorithm. He actually points out that encapsulating the code into a class is a Good Thing and writes a section on it. Then he moves on to unordered lists (a.k.a. a bag). He introduces the concept of a garbage value to mark items in the array as deleted (so that you don't have to rearrange items when you delete one). His array is an array of integers at this point, and he suggests using -32767 as a garbage value. Huh? Does he think integers are two bytes long in 32-bit Delphi 3? I turn over the page to page 24 to meet the following type declaration:

type
  MyRecord = record
    Name : string * 20;
    IsGarbage : boolean;
  end;

My heart sank. This is a Visual Basic string declaration, not a Delphi one. Didn't anybody tech edit this book? (A quick look at the front cover reveals no tech editor name.) I'm sorry but this kind of stuff jumps out of the page at you—I'm not arguing about subtle behavior here, this is inexcusable.

A couple of pages further on, he's finished with array list and moves onto linked lists. Whoa, there. What about TList, the famous array list that's been part of Delphi since version 1? Where's a discussion of its strengths and pitfalls? Where's a nifty TList class lookalike that performs garbage management on the side, as he's just been discussing? Nope, sorry, it wasn't in the VB book, so it's not here.

All right, linked lists. I sighed on page 27 because there were some VB string declarations, yet again. Anyway, he moves quickly though what a single linked list is, how to insert and delete items. I hit page 32 where he talks about the differences between array lists and linked lists. He says "Linked lists are usually better, but the array-based list is superior in one area: memory usage." Only one area? He misses the very important advantage of arrays: accessing the Nth item is an O(1) operation, compared with the O(N) operation with linked lists. This advantage was sufficient to make Delphi's developers write the TList as an array and not as a linked list.

We move on though some linked list variants: circular lists, doubly linked lists, threaded lists. Suddenly on page 39 he's declaring strings properly. Good (not that it means that proofreading was more thorough here, the code is extracted from the CD). There's a bizarre sentence a third of the way down page 39: "Because '~' comes alphabetically after all other visible ASCII characters, the program sets the bottom sentinel's LastName value to '~~~~~~~~~~~~~~~~'." What's happening here is that he wants to have a special value to stop traversals of the list running off the end. This is complete balderdash though: there are 120-odd characters that come after '~'. This is a hack, pure and simple, and should not belong in a book like this.

Chapter 3 : Stacks and Queues, and I was starting to feel somewhat disappointed with the execution of this book. Still, I steeled myself and forged on. We're back to classes again, having spent most of the list chapter in procedural-land. First off are array-based stacks. Page 45 has a bug in the code—the class is TArrayStack, yet one of the methods uses TSimpleStack. All right, minor. Then we're into linked list stacks. Page 47 is confirmation of something suspected before: he thinks integers are 2 bytes long—it says so in the text. He also states that the main drawback to linked list stacks is that they require extra memory. Well, yes, but he doesn't state that they're slower as well: you are continually allocating and disposing nodes, which all takes time. This was starting to get to me. I looked back at the introduction: the book is aimed at programmers that have "a good understanding of the fundamentals of Delphi." It's a shame that it doesn't seem to apply to the author.

Queues next. Circular queues. Linked list queues. Priority Queues. Multiheaded queues. The latter section was quite interesting, and my doubts started to alleviate. Maybe the first couple of chapters were an aberration and the rest of the book would get better.

Chapter 4 then: Arrays. We start off with Triangular arrays (imagine a two-dimensional matrix of items, of which you only need the items below the leading diagonal, or that the entries either side of the diagonal are the same) and how to map them onto a linear array. I turn over to the second page of the chapter, page 62. And see this bit of code:

x := Round(i * (i - 1) / 2) + j;

(It calculates the element number in a linear array for an element in the triangular array.) DO WHAT? Look at this code again. All variables are integers. It uses a floating point divide operation, and then rounds the result to an integer again. An absolute beginner mistake, or a badly converted VB expression; either way, inexcusable. The div operator (integer divide) seems to elude Stephens here. Furthermore the compiler would optimize the "div 2" into a shift-right operation instead. And then we get some gobbledygook that packing a triangular array into a linear array is an example of space versus time tradeoff: you're using less space than a two-dimensional matrix where half the elements aren't used, but you have a 'complex' calculation to work out where an element is stored. Huh? Exactly how does Stephens think the compiler works out where an element in a matrix can be found? Here we are:

x := (i * ElementsInRow) + j;

(of course this calculation depends on whether things are zero-based or not.) We're saving a decrement and a shift-right operation, that's all (providing we don't use floating point arithmetic). Virtually nothing, not even worth bothering to mention.

And then I noticed something else from looking at the code. We'd suddenly moved to using zero-based arrays, from one-based arrays, and yet nothing had been said in the text (using zero-based arrays makes the above calculations easier). Another bit of sloppiness.

At that point I'd reached page 64 (of 400 pages), thoroughly depressed about the whole book. At that point I'd come to a conclusion: the information about data structures and algorithms discussed so far was interesting and, in the main, accurate. The code was not up to scratch though; there were just far too many slip-ups and bugs. It showed that the book was a rushed conversion from the Visual Basic edition.

[A couple of weeks later—I got married in the meantime.]

Rereading what I'd written last time makes me feel a little uneasy. I feel as if I'd been too hard. But then I thought that, no, someone is at fault with this book: Rod Stephens himself for not taking better care in converting his VB book, or Wiley for not getting it tech edited. The errors I'm finding should have been found before printing by Stephens himself or a proper tech editor. In fact, to compound this point, I've just revisited the book's URL and it's still points to a site for VB. Anyway, onwards...

After talking about triangular matrixes, Stephens moves onto the Forward Star data structure for storing irregular arrays—one that I didn't know about, mainly because I'd automatically use a linked list representation instead. Which he then discusses. We move onto sparse arrays (large arrays with comparatively few non-null entries), all of which are implemented with linked lists. This section was informative and interesting.

Chapter 5, and we move onto recursion. This chapter was excellent. We start off with some simple algorithms (factorial and GCD—greatest common divisor) and he discusses recursive implementations of these together with an analysis of their run-time characteristics using the big-O notation. A small niggle: the algorithm for calculating the GCD was known by the ancient Greeks and was described in Euclid's Elements, and was not discovered by Euler in the 18th century. We then encounter recursive Hilbert and Sierpinski curves, and again we talk about their run-time characteristics. Having met a few recursive routines, Stephens than talks about why recursion is good, and when it can be bad. We have a discussion on how to remove recursion via explicit stacks and the like. As I said, an interesting chapter, and probably because the text does not attempt to provide anything "Ready To Run" but instead talks about more general things: not a recipe with absolute quantities, more general cooking guidelines.

And now we're onto Chapter 6, and start looking at trees. We see some representations of generic N-ary trees, including a forward star representation. He talks about traversals of trees: preorder, inorder, postorder and level order (breadth-first traversal) and shows that inorder is the only traversal whose concept doesn't really extend beyond binary trees. Sorted binary trees (binary search trees) are discussed, including insertion and deletion of nodes. Threaded trees are introduced and the problems of maintaining them are solved. We then meet quadtrees.

Chapter 7 looks at the problem of balancing binary search trees, and introduces AVL trees as a solution for it. The AVL node insertion and deletion algorithms are discussed at length. Funnily enough, red-black trees are ignored, despite the fact that they were the data structure of choice for the Standard Template Library with C++. The chapter closes off with an introduction to and implementation of B-trees. I wasn't very happy with this section—it glossed over some intricacies of B-trees, it didn't discuss the problems with nomenclature (some authors define a B-tree of order K differently than others), and so on. Also the B-tree available on the CD is specific to the example in the book—it's not very extensible (I'll be returning to this point later).

Decision trees are next in the book, Chapter 8. We talk about game trees and various heuristics for searching them. Quite a fascinating chapter, even though there's no mention of genetic algorithms (although the simulated annealing method comes close). The end of the chapter is reserved for various hard problems—those that can be solved with few nodes, but with larger numbers of nodes, prove intractable.

Chapter 8 and sorting was next. We start out with a small discussion on keys and comparing them. He talks a little about compressing string keys to aid in comparisons, but assumes that keys just consist of the 26 Latin characters—no pesky accents or other diacritical marks, no worries about different code pages using different collation sequences. My advice is to ignore it. Then we're away into the actual sorting algorithms. We start with selection sort. Stephens characterizes it as O(N²) correctly, but then misses the big advantage of selection sort: if the time taken to swap or move elements is far greater than the time to compare them, selection sort is extremely fast, an O(N) type algorithm. Insertion sort is tossed to one side as not being worthy of consideration (funny my tests indicate otherwise, and Sedgewick backs me up on this, and I note that Stephens doesn't attempt to optimize the two loops he uses into one) except for a minor subsection on using insertion sort with linked lists. And then we find bubble sort. Actually Stephens describes shaker sort at this point, but it doesn't matter. He prefers this sort to insertion and selection sort; I'm not sure why though, since he doesn't provide any performance comparisons between the three. He does use a nifty trick with bubble sort though which I haven't seen before—he doesn't swap elements per se, he just saves the one he wants to move and then moves all the others up one (or down one) until the correct place for the saved one is exposed. One of these days, I will write a test suite to see whether Stephens 'improved' bubblesort is better than the standard, improved insertion sort.

And then we hit quicksort. I'm sorry, but his implementation of quicksort is pretty yucky. The code on page 237 has VB code embedded (e.g., an assignment that looks like this: "lo = min"), it's sloppy and unoptimized, it's hard to follow (and it's not explained to boot—I had to get a piece of paper out and desk-check it before I believed it). He discusses some of the problems with quicksort—which partition value to use, quicksorting small subfiles is slow—and provides solutions—random partition value and using a cutoff point for subfiles, followed by a simpler sort (usually insertion sort, but he promotes selection sort). Although he mentions the median-of-three method for calculating a partition value, he neglects to point out its big benefit: it removes some extra conditions inside the quicksort internal loops because the method automatically provides sentinels. But we hit another big howler in the improved quicksort: Stephens assumes that the Random function, when used with an integer parameter, returns a floating point value and then uses Trunc to get an integer value. Bzzzt! Wrong. Maybe that's so in VB land, but in Delphi it already returns an integer between 0 and one less than the passed parameter.

[a few months later—spare time was precious]

Back again. This time I shall finish this review!

After Quicksort, Stephens moves onto Mergesort. Competently written, but he manages to ignore the main use of Mergesort: sorting with the use of external files in order to minimize the amount of memory used. Then we're onto Heapsort. For this he first describes the heap structure and how to use an array to represent it. Then he describes Floyd's Algorithm for heapifying an array, a nice description this. Once those are out of the way, the standard priority queue is introduced (i.e., the one that uses a heap) and the trickle down and bubble up operations are shown (they're called push down and push up here). After all that, Stephens continues with the heapsort algorithm.

After these classic sorts, to finish off the chapter, we're shown a couple of minor ones: Countingsort and Bucketsort.

The big problem with this chapter for me is that, although the different algorithms were talked about competently enough, all of the "Ready-To-Run" code we were given sorted integers. What if we wanted to sort something else? Strings and other numeric types would be simple enough, but what about sorting records? What about writing a generic sort routine that would sort (gasp) anything? What problems would we run into there? How would we get round them? How would we perform comparisons?

[Added 16-Jun-1999] I've been writing about sorts myself and have a few more comments to make on Stephens' approach. Firstly, the time he spends on his version of bubble sort is not worth it in my view. My tests have shown that a standard insertion sort would beat it every time. Indeed an optimized insertion sort--which he doesn't go into--is much faster. Unfortunately the insertion sort he presents in the chapter suffers from the for loop problem I've already discussed. Admittedly, his bubble sort does beat the standard bubble sort or shaker sort, but again it seems silly to spend so long discussing in depth an algorithm you would never actually use.

Having done some timing tests, I can report that his optimized quicksort routine is slower than the one we all get in TList.Sort or TStringList.Sort. Yes, you heard correctly, his optimized quicksort is slower than Borland's unoptimized one. Why? Several reasons I suppose. He chooses the slowest way to calculate the pivot (the fastest is the mid-point method, a nearly as fast method, but better in certain respects, is the median of three method). He uses the wrong element to store the pivot (a better place should be the middle element). He slows down the entire quicksort by having extraneous tests inside the fast inner loops (ideally the inner loops should be "while compare items do inc/dec index" or "repeat inc/dec index until compare items" for the fastest speed, yet his have extra conditional tests on the indexes). And of course he uses selection sort, when an optimized insertion sort is better.

Something that entirely surprised me, and one that I shall be continuing to experiment with, was that his implementation of quicksort is the worst you can write for sorting sets of items that consist of a small number of values repeated many many times. He mentions that this situation is bad news for quicksort, and goes into details about it. The first time I read this I accepted it. However, believe it or not, the reason it is bad is because of the way he calculates and places the pivot item. If you use a different method for choosing the pivot and placing it, sorting this type of set is faster than sorting a randomly organized set. My research and experiments have shown that selecting the pivot correctly and placing it in the center of the list to be partitioned, gives you a quicksort that works well with randomly sorted lists as well as sorted lists, reverse sorted lists, and lists consisting of many many equal elements, the traditional worst case scenarios for quicksort.

While we are on quicksort, it seems bizarre that Stephens, having spent a whole chapter on recursion and removing it, does not remove the recursion in the quicksort algorithm. One of the recursive calls is simple to remove--it's a tail-end recursive call, the simplest type to avoid, and it's the one that would make the most difference to the execution speed. The other one requires the use of an explicit stack, but is fairly simple to do, nevertheless (it doesn't make much difference to the execution speed though). Also, he mentions several times about the possibility of a blown or runaway program stack with quicksort in the worst case, but doesn't mention how to avoid it (and the avoidance strategy is simple in the extreme--always sort the smallest sublist first).

As for mergesort, he uses the approach of having an auxiliary array the same size as the original to help in the sort. In fact, it is easy to get away with an array only half the size. [End addition]

Chapter 10 and we're onto Searching. A sequential search algorithm for an array is described and the routine given. A minor modification for a sorted array is shown as well as searching through a linked list (funnily enough, the linked list in the supplied code has a dummy head node, but this is not referred to in the text). The familiar binary search algorithm is then introduced. But, oh dear, here we go again, Stephens hasn't heard of the div operator and so uses a floating point divide and a call to Round (and anyway, why Round? why not Trunc?). Funnily enough he doesn't mention how to maintain a sorted array when inserting a new item. Here we use a binary search method to find where the new item should be inserted. It seems that whenever I use a binary search it's for that reason, as well as the classic "find the item" reason. A search method of minor benefit is shown next: the interpolation search. The chapter closes with a couple of improvements to binary search.

Chapter 11 is for a discussion on hashing (one of my favorite algorithms and data containers—I think it was this first one I was struck by in my first programming job after university). The big problem, in my view, with this chapter is the following. Are you ready? Nowhere in this chapter are any hashing functions described. Isn't that completely amazing? We have a chapter on hash tables, collision resolution schemes, various types of probes, disk schemes, but nowhere are we shown how to generate a hash key. It appears miraculously out of thin air. The big reason is that he assumes that items you are inserting into a hash table are… integers. Their hash value is the integer value itself. So, like, what do we do if the item we're inserting is, you know, a string? Horror! We certainly won't find out about good hash functions from this particular chapter and book. Ready-To-Run? Not this chapter. Levity aside, this is a glaring omission. There is no discussion about hash functions; nothing about what makes a good one, or a bad one; whether there could be a perfect one, and if there was whether it would be worth finding.

Chapter 12 and networks, usually called graphs. This material in this chapter is handled fairly well and he covers all the usual bases: representation in memory, traversals, spanning trees, shortest path, topological sorts.

Chapter 13 is bizarre. To give it its title: Object-Oriented Techniques. Sounds all right to me, let's read on. He introduces the standard three benefits of OOP: encapsulation, polymorphism, and inheritance (although he calls the latter reuse a little too often—is that what it's called in VB?). Bizarre Bit Number 1: he states, rightly, that a class should not allow access to its data. He then states that to enforce this the class should make the data private and provide public methods to read and write the data. He even calls these routines a special name: property procedures. But he doesn't talk about properties themselves. Weird or what? Why? Does he not know about the property keyword? It's the perfect way of hiding data in Delphi, for heaven's sake.

And then we get to Bizarre Bit Number 2. Ready for this? He spends the rest of the chapter talking about Design Patterns, those object-oriented algorithms described so well in the seminal Design Patterns by Gamma et al (a.k.a. The Gang of Four). But he doesn't call them that; oh no, that's too easy. Instead he calls them paradigms. To quote Will Watts in Developers Review, October 1998, "[I]n his new book, [he] blatantly renames patterns as 'paradigms', as though he's thought of them himself. Mr. Stephens discourtesy [in not citing Design Patterns] is foolish and smacks of theft, and Wiley should know better than to print this sort of thing." I wouldn't go as far as to say theft, but certainly it is stupid. Say I knew nothing about the GOF book and wanted to know more about paradigms. Where do I go? There are no bibliographic references in Stephens' book and everybody else calls them patterns. I'm SOL, as they say in America.

So here we are at the end of the book. Executive summary time. Is it any good? I'd give it 5 out of 10—the algorithms are generally well described (with some blindingly obvious holes), but in my opinion a lot of the code is pretty bad. Making the algorithms generic is ignored completely, for example. Is any of the code ready to run, as advertised in the title? I'd say no, not unless you spend your entire coding life sorting integers or storing integers in hash tables or you just happen to want the record type that Stephens used with B-trees. Is it worth buying? In my opinion, no. I get more and much better information from Algorithms in C by Robert Sedgewick, even having to go through the problems of converting it to Delphi. The word that describes my overall impression of the book is sloppy.

And has Stephens learned any more since writing the book? At the time I wrote this, he was writing a monthly article in Delphi Informant and a recent one (converted from a section in chapter 12) assumed that the maximum integer value was 32767. Sigh. Someone tell him, please.