Refactoring != Rewriting

published: Tue, 7-Mar-2006   |   updated: Sun, 23-Jul-2006

It seems simple enough to me. Martin Fowler spent a long time (as did his contributors and reviewers) to make sure that Refactoring: Improving the Design of Existing Code was a supremely useful reference book as well as a treatise on how to do refactoring well. And, blimey, did they do a bang-up job or what? It's da bomb, in the current vernacular. Every time I open it, I find something new. I've even reread the initial chapters on how to refactor several times and I already feel that it's time to do so again. So why the *&#% do some developers treat the word "refactoring" to mean "rewriting"?

Look back a moment at the title of the book. The subtitle is Improving the Design of Existing Code. Yes, you've already guessed, this is to be a rant: I've enboldened the word Design in order to have a ready club to beat certain developers. Sorry, hate to be a pain and all that, but refactoring does not mean rewriting.

Unfortunately certain developers really have taken it amongst themselves to view the word "refactor" to be a cooler, more hip way of saying "rewrite." It implies they're, like, on the cutting edge of software development, dude. If you ask them, well, which particular refactorings did you use, hoping to hear a possible mention of Extract Superclass or a Move Method or one of the tens of others, you'll just get a blank stare. No, I just rewrote it, they'll say.

And then you look at their code. The design has not been improved at all. It's just a rewrite. Different classes appear, different methods are found, some other stuff got moved around. But it wasn't refactoring because the design is just as bad, but in a different way.

Martin Fowler puts it very well: "[Refactoring] is a disciplined way to clean up code that minimizes the chances of introducing bugs. [...] With refactoring you can take a bad design, chaos even, and rework it into well-designed code. Each step is simple, even simplistic." [Preface, page xvi] Of course, it goes without saying almost that to have a hope of succeeding with refactoring you must know what a better design means and why one design is better than another. Without that guidance, you have no hope of even writing good code in the first place.

In order to refactor some code, you must have the "discipline" part of the definition. You must read and analyze the code so that you can understand the multiple simple steps needed to get to where you want the code to be. Refactoring is not hack and slash, it's the application of standard rules one after the other in order to get a better design for some well-understood value of "better".

In fact, the analysis part is almost like doing a code review of someone else's code. If you haven't done a code review (and I'm not talking about the blind checking of the application of a set of coding standards), it's hard to understand why a design could be viewed as bad. Try it one day. Download some code from sourceforge or CodeProject and open up a source file and read it. (There's a lesson that should be taught to every programmer: how to read code. If developers knew how to read code, they'd write better code in the first place.) Then think about that code. Is it well written? Do you understand what it does? Does it rely on a whole bunch of other stuff you aren't reading right now? Does it look awkward? Are the methods well-named for the work they're doing? And so on, so forth.

Here's an example of some notes I made for myself during a code review.

There is this class called Tree which, when created as an instance, isn’t (a tree, that is). The instance is just a data container with public fields and no behavior. Instead the class has static methods that define a tree and its behavior. Except that they don’t. Instead they glop a true tree of objects into an array of Tree objects. And of course, a Tree object is not a tree.

In other words: the Tree class is actually a node from a tree, with the tree functionality tacked on to the class as static methods. And the actual tree is just an array (no, don't ask how the mapping was made from the array back to a tree; let's say it won't make it into any algorithm book I'd be buying).

But, in essence, this one class should be two. One for the tree and one for a node in the tree. With this insight, we have a plan, a goal. We can use Extract Class to go the first step and extract both possible classes (TreeNode, Tree) from the one bucket of glop. We can then analyze the code in the Tree mothods to determine whether it applies more to the TreeNode or the Tree and use Move Method. And so on, so forth, one step at a time. Maybe other classes will present themselves as you go through, so you end up with more than two classes.

(Aside: sometimes I wonder whether the standard of having one class per source file in C# causes or reinforces this kind of glop. I must admit I go through several iterations sometimes on deciding on the names of classes. Everytime I must change the name of the file too. And then it has to be checked into the source control manager and the old file removed, or mapped to the new one, and so on.)

Ach, but then all this work also requires us to have a good set of unit tests to ensure we don't change functionality, that we don't introduce bugs. But that is another post for another day.

But the essence of my argument is that refactoring is about making better designs from existing code, and that there are a bunch of recipes for doing it, a lot of them collected in the above refactoring bible. It's not about just rewriting the code in an ad hoc fashion, and the sooner many developers realize this, the better off we'll all be.