Writing a priority queue in C# (part 3)
published: Tue, 17-Feb-2004 | updated: Tue, 20-Apr-2004
In the previous installments of this occasional series (
here's part 1 and
here's part 2), we'd looked at
how to write a priority queue in C#, such that it followed the
standards set by the System.Collections
namespace.
At the end of the last installment we ended up with a
PriorityQueue
class that used the heap algorithm, that
stored object
instances that, used integers as the
priority, and that implemented ICollection
(so that it
can be viewed as a .NET collection).
Before we move on, the code I published last time has a subtle bug. Not that the priority queue doesn't enqueue and dequeue items properly, you understand, but that it likes to keep a hold of objects well beyond their possible deaths.
To understand what the bug is, we need to quickly review some basic .NET functionality. In .NET you don't have to free any of the objects you create since the .NET Common Language Runtime includes a garbage collector that doe sit for you. This garbage collector periodically marks all objects that are reachable and sweeps up the remainder (the unreachable ones) to reclaim their heap memory. (There's a little more to it than that, of course, but this will suffice for now.)
So, what are unreachable objects? Well, when the garbage collector (let's call it GC from now on) runs, it works out which objects have active references. It looks in the method that was currently executing (so all its local variables will be marked as reachable), as well as the chain of methods that called the current one. For some objects, there will be references to other objects within them, so those also get marked as reachable. The GC, in essence, creates a graph of objects (a graph is a data structure consisting of vertices — the reachable objects, in our case — and edges — the links between the objects). After this analysis is complete, the GC knows which objects are reachable from the currently executing code, and therefore which objects are not. It then sweeps up (or compacts) the unreachable memory.
So, let's go back to our priority queue and especially the
Dequeue()
method.
public object Dequeue() { if (count == 0) throw new InvalidOperationException(); object result = heap[0].Item; count--; trickleDown(0, heap[count]); version++; return result; }
What's happening here? Well, we save the object at the top of the heap (the zeroth item), decrement the count (there's now one less item in the heap), and we trickle down the item at the last index of the heap.
Actually, the English language is masking the problem. Suppose we had 10 items in the heap to begin with, numbered 0 to 9. We make a copy of object 0, decrement the count to 9 (so that the items are numbered from 0 to 8), and then trickle down the item at position 9. This does not cause an out-of-bounds exception, by the way; the array onto which we're mapping the heap has many more items available.
Anyway, after the trickle down operation, the old items from 1 to 8, together with the old item 9, will have been rearranged in some way to form a new ordering of the nine items. Any reference to the old item 0 will have disappeared in the process (luckily we have a copy). However — and here's the bug — there will still be a reference to the old item 9 at index 9 in the array.
In other words, if we were to enqueue ten items, dequeue them all, and abandon these dequeued references, the ten objects will not die, at least until the priority queue is allowed to die.
PriorityQueue pq = new PriorityQueue(); for (int i = 0; i < 10; i++) { pq.Enqueue(new object(), i); } while (pq.Count != 0) { // dequeue and throw away the object pq.Dequeue(); } // but, all 10 objects are still alive!
This is bad news indeed. The Dequeue()
method should in
fact be coded like this:
public object Dequeue() { if (count == 0) throw new InvalidOperationException(); object result = heap[0].Item; count--; trickleDown(0, heap[count]); heap[count].Clear(); version++; return result; }
Here we make use of a new method Clear()
of the
HeapEntry
struct that resets its item reference and
priority.
Now that we've cleared up this anomaly, let's move on to some extra functionality.
We've so far agreed to make the priority an integer value: higher values have higher priority. I can imagine other types of priority as well, perhaps integers where lower values have higher priority, perhaps floating point values, perhaps a full-blown priority class. We should change our priority queue so that the priorities it uses are not assumed to be simple ascending integers.
The easiest way to do this is make the priority an
IComparable
instance instead (that is, an instance of a
class that implements IComparable.CompareTo()
). As it
turns out, apart from the various method signatures that accept a
priority, we only have to change two lines of code, where we compared
the integer values.
First, let's write the test:
[Test] public void TestPriorityType() { pq.Enqueue("item one", new ReversePriority(12)); pq.Enqueue("item two", new ReversePriority(5)); string s = (string) pq.Dequeue(); Assertion.AssertEquals( "Dequeuing item should set count to 1", 1, pq.Count); Assertion.AssertEquals( "Dequeuing should retrieve highest priority item", "item two", s); s = (string) pq.Dequeue(); Assertion.AssertEquals( "Dequeuing final item should set count to 0", 0, pq.Count); Assertion.AssertEquals( "Dequeuing should retrieve next item", "item one", s); }
This fails, of course. My intent here is to have the
ReversePriority
class organize priorities in reverse
order, so to make the test pass we first have to write this class:
public class ReversePriority : IComparable { private int priority; public ReversePriority(int priority) { this.priority = priority; } public int CompareTo(object obj) { ReversePriority castObj = obj as ReversePriority; if (castObj == null) throw new InvalidCastException(); if (priority < castObj.priority) return 1; if (priority == castObj.priority) return 0; return -1; } }
Now we can write the code in HeapEntry
and
PriorityQueue
that will make the test pass:
public struct HeapEntry { private object item; private IComparable priority; public HeapEntry(object item, IComparable priority) { this.item = item; this.priority = priority; } public object Item { get {return item;} } public IComparable Priority { get {return priority;} } public void Clear() { item = null; priority = null; } } public class PriorityQueue : ICollection { ... public void Enqueue(object item, IComparable priority) { if (count == capacity) growHeap(); count++; bubbleUp(count - 1, new HeapEntry(item, priority)); version++; } private void bubbleUp(int index, HeapEntry he) { int parent = getParent(index); // note: (index > 0) means there is a parent while ((index > 0) && (heap[parent].Priority.CompareTo(he.Priority) < 0)) { heap[index] = heap[parent]; index = parent; parent = getParent(index); } heap[index] = he; } ... private void trickleDown(int index, HeapEntry he) { int child = getLeftChild(index); while (child < count) { if (((child + 1) < count) && (heap[child].Priority.CompareTo(heap[child + 1].Priority) < 0)) { child++; } heap[index] = heap[child]; index = child; child = getLeftChild(index); } bubbleUp(index, he); } ... }
Nothing too surprising here, as I've already said. In essence the
priority is an IComparable
instance, and the comparisons
between two priorities are done with CompareTo()
.
The next thing we should consider for our PriorityQueue
class is the serialization of an instance. On the one hand this is
really simple (just add the Serializable
attribute to the
PriorityQueue
and the HeapEntry
classes),
but on the other hand the simple answer is very wasteful, at least in
terms of space. Let's see why.
The PriorityQueue
class contains an array of
HeapEntry
items. This array is deliberately created too
large so that we have room to add new items to the priority queue
without constantly growing the array. If we just add the
Serializable
attribute to the PriorityQueue
class, the serialization code will save all of our members (good), but
will, in serializing the array, serialize a whole set of null entries
(not so good). For a little extra effort, we can serialize only the
non-empty elements in the array.
First thing we do is to write the serialization test.
[Test] public void TestSerialization() { pq.Enqueue("item one", new ReversePriority(12)); pq.Enqueue("item two", new ReversePriority(5)); Stream s = new MemoryStream(); IFormatter f = new BinaryFormatter(); f.Serialize(s, pq); s.Position = 0; PriorityQueue pq2 = (PriorityQueue) f.Deserialize(s); s.Close(); string str = (string) pq2.Dequeue(); Assertion.AssertEquals( "Dequeuing item should set count to 1", 1, pq2.Count); Assertion.AssertEquals( "Dequeuing should retrieve highest priority item", "item two", str); str = (string) pq2.Dequeue(); Assertion.AssertEquals( "Dequeuing final item should set count to 0", 0, pq2.Count); Assertion.AssertEquals( "Dequeuing should retrieve next item", "item one", str); }
This test fails, of course, so the first thing we should do is to mark
the classes with the Serializable
attribute, and re-run
the test.
[Serializable()] public struct HeapEntry { ... } [Serializable()] public class PriorityQueue : ICollection { ... }
The test passes; however, we've already discussed that the size of the serialized priority queue will generally be larger than it should be.
Let's refactor. Mark the PriorityQueue
class as
implementing ISerializable
. By doing so, we are telling
the serialization engine that we'll be doing the serializing. In
adding the ISerializable
interface, we must implement the
GetObjectData()
method:
void ISerializable.GetObjectData(SerializationInfo info, StreamingContext context) { info.AddValue("capacity", capacity); info.AddValue("count", count); HeapEntry[] heapCopy = new HeapEntry[count]; Array.Copy(heap, 0, heapCopy, 0, count); info.AddValue("heapentries", heapCopy, typeof(HeapEntry[])); }
What's happening here? Well, first of all, we shall serialize the capacity and the count values. We need these later when we deserialize the object, so that we can reconstitute the queue. Now we need to serialize the heap array, but as we've stated, it could have a lot of empty entries. So we create a temporary array of exactly the right size, copy all the entries into it and then serialize this temporary array.
What about deserialization? Well, here we need to write a protected constructor for the deserialization engine.
protected PriorityQueue(SerializationInfo info, StreamingContext context) { capacity = info.GetInt32("capacity"); count = info.GetInt32("count"); HeapEntry[] heapCopy = (HeapEntry[]) info.GetValue("heapentries", typeof(HeapEntry[])); heap = new HeapEntry[capacity]; Array.Copy(heapCopy, 0, heap, 0, count); version = 0; }
First we read the capacity and the count. Then we read the array of
entries, count
items in all. At this point, we can create
out internal array (of size capacity
, not
count
), and then copy the entries from the temporary
array into the internal array. Finally we set version
to
0 (we didn't serialize it, though we could have done).
Finally, after all this refactoring, we run the tests again to show that they still pass.
(By the way, notice that we're repeating the names of the serialized values in both the constructor and GetObjectData(). A definite code smell that we should remove by using some constants. I won't show that code here.)
This time around in our series on implementing a priority queue in C#,
we've discussed a subtle bug in the previous version, one that wasn't
(and couldn't really be) brought out by the unit tests, and then
solved it. Then we changed the definition of a priority from an
integer to an instance of IComparable
, so that we could
do some more interesting things with priorities. Finally, we showed
how to implement efficient serialization for the priority queue class.
The latest code (as of the end of this article) can be found here.