Cyclomatic Complexity

published: Tue, 13-Dec-2005   |   updated: Wed, 14-Dec-2005

Something that came up in a conversation: what is cyclomatic complexity (CC)? If I gave you a method, how would you calculate the CC of the method from scratch? Actually the second question assumes you know what CC is, so I'm cheating a little. But bear this in mind: managing complexity is an important part of being a good software designer and programmer and CC is a measure of complexity.

You're writing some code. Is it simple to understand? Is it complex? Why is it important to think of your code in such terms? Well, I'd have to say that, unless you are writing the mythical one-off program, someone, maybe you, is going to have to read and understand the code at some stage in order to modify it, or fix it, or add some extra functionality.

Perhaps you'd say one way to describe the complexity of a method is to count the number of lines of code in the method. The more lines of code, the more complex the method is; the fewer lines of code there are, the simpler it is. Easy, right?

You're left with a problem though. What is a line of code? Does the syntactical fluff of your language of preference count? (That is, the begins, ends, braces, and other tokens that delineate blocks of code in the method.) So, for example, are there two lines of code in the following Delphi fragment or three?

if (i >= 0) then begin
  List.Delete(i);
end;

Well, if you count the end statement there are three, but the end statement is not really about complexity, it's there to satisfy the language syntax. Really we should ignore these pieces of syntactic fluff and just count the statements that are doing something.

But we're not finished yet. What about statements that, to make the code easier to read, are continued on several lines? Are they one line of code, or several, one per continuation line? What about declarations of variables (the var block in Delphi is an example)? What about the declaration of the method itself and so on?

What about comments? Heck, no. White space (new lines)? Double-heck, no. Compiler ifdefs, regions? No, way. I think. Unless… Er, dunno.

If a method is an initialization method for some object it will probably consist of a dozen or more statements that set properties, arranged in a sequence. No conditional or looping statements to be found. Is that more complex than another method that does the same for an object with fewer properties? Probably, but I'd have to say, in essence, they're both as complex as each other.

So, I think you can see that taking the lines of code measurement as a complexity metric is just not good enough. It doesn't adequately express the complexity of the method in any meaningful way. More likely it is a very approximate way to describe complexity (methods that are hard to understand tend to have more lines of code -- or use deliberately confusing identifiers, perhaps -- than ones that are easier to understand).

Can we do better?

One better metric was invented by Tom McCabe in, believe it or not, 1976 (Next year, the paper he wrote will be 30 years old and yet, still, many developers have not heard of either the paper or the concept.) He called it cyclomatic complexity, and it is a description of the complexity of the control flow of a program (or a method, in our case). To put it another way, it's a numerical description of the complexity of a flowchart: the more decision boxes in a flowchart, the more complex the control flow.

By analyzing many software projects, both before and after the software was released, he showed that the reliability of a program is inversely correlated and the number of errors a program exhibits is directly correlated to its control flow complexity. In other words, the more complex a program is, the less reliable it is and the more errors show up after the program is released.

McCabe's research has been confirmed in later years by other researchers. His findings have been extended by additional metrics (such as, the number of local variables coupled with the cyclomatic complexity is a better indicator of a program's reliability), but the CC metric is still an extremely good indicator of how "good" a program is.

Reread the previous two paragraphs and mentally replace the word "program" with "method". In essence, the reliability (whether we'll find bugs after it has been used) of a method is inversely correlated with its CC value. If we write simple methods we'll be less likely to write buggy ones. Well, d'oh!

So how do you calculate the CC value for a method? McCabe's method was defined in terms of decision points in flowcharts, but we tend not to draw those these days. Instead here’s his algorithm for calculating the CC value for a method defined in terms of the tokens of the language you use.

  1. Start with 1. All this says is that, if called, the method will finish; there is at least one path through the method.
  2. Add 1 for each conditional statement or looping statement. (So, add 1 for an if, or a while, or a foreach, or a ternary operator, etc.)
  3. Add 1 for each AND or OR logical operator used in a condition.
  4. Add 1 for each case in a case or switch statement. If the case/switch statement doesn't have an explicit default case, add 1.
  5. Add 1 for each catch statement (for Delphi: the except statement).

That's all. Pretty easy to calculate really. Here's a couple of examples to illustrate the calculation and to exercise your counting abilities:

public WaitableThread(ThreadStart start) {
  this.start = start; 
  this.signal = new ManualResetEvent(false);
  this.SafeWaitHandle = signal.SafeWaitHandle;
  this.thread = new Thread(new ThreadStart(ExecuteDelegate));
}

This method has a CC value of 1. No conditionals, loops, or anything else of interest.

public bool Dequeue(out T item) {
  foreach (LockFreeQueue q in queueList) {
    if (q.Dequeue(out item)) {
      return true;
    }
  }
  item = default(T);
  return false;
}

This method has a CC value of 3. Count 1 for the main path, 1 for the foreach, and 1 for the if.

Anyway, at the end of this little calculation you'll get a value that's at least 1. If it is exactly 1, then the CC value shows that there is only one path through the method; it is merely a sequence of one or more statements. Between 1 and 5, you shouldn't worry particularly about the method: it's not that complex. If the value were between 5 and 10, I'd have to say that the method could do with some refactoring. It's getting a little too complex to understand in one go. Any method that has a CC value of 10 or more is definitely too complex and should be refactored.

Put it another way. In essence, the CC value is a measure of the number of paths through the method and so indicates at least the number of tests you need to write to properly exercise the method. "At least", note. There will probably be more. So a method of complexity 10 will require, at a minimum, 10 tests to be written against it to ensure that all paths are exercised and checked. Which would you rather: test a method with CC 2, or a method with CC 20?

All this plays nicely into the TDD method of designing and writing software. Once you've mastered the CC calculation, you'll be able to intuitively decide whether a method is getting too complex and needs refactoring. You'll get into the habit of avoiding big case statements. You'll learn the various design patterns that enable you to refactor your code into something less complex. You'll become a better developer. Personally I like methods with a CC value of 5 or less, generally way less than 5. 1 is pure goodness, 2 or 3 is OK. With 4 or more I tend to be looking for ways to simplify the method.

Update: 14-Dec-2005

Although in the algorithm I wrote try in the fifth step, I really meant catch, since it is this statement that implies a conditional statement ("if an exception is thrown, do this"). A finally doesn't really mean a conditional branch since it implies that the code should happen no matter what. McCabe's original algorithm doesn't really have anything to say about this (it was devised well before exceptions were invented), and I'm not really sure how I'd show a try..catch in a flowchart anyway, unless there were a conditional test after every statement in a try block. I think that's overkill, otherwise adding a try..catch block would blow your CC value sky-high very easily.