Profiling and optimizing Huffman

The Huffman example I posted last month has the virtue of being quite simple. On the flip-side, it has some room for improvement, particularly when it comes to performance.

Of course no less a computer scientist than Don Knuth once said that “Premature Optimization is the root of all evil.” So it makes sense to look for the most obvious bottlenecks first.

simplehuffman2.c is the optimized version. A description of the approach follows.

The main trick of optimization is of course to first find out where the program is spending its time. That way you can spend your effort fixing these ‘hotspots’ rather than wasting it on code that has little or no relevance to performance. For this, a good profiler is essential. On Linux, I’ve found oprofile to be an excellent tool, with numerous good tutorials.

This is the profile for simplehuffman when encoding a 10MB file:

Prof1

Perhaps unsurprisingly, the choice of fgetc/fputc for disk I/O is very costly (accounting for half the runtime of the encoding operation.

The way to avoid single-character disk I/O is to use some sort of buffering. We can read many characters at a time into a temporary buffer, and then grab them from the buffer to process. Writing can be done the same way – first build the output in a buffer, then write it out in one step. All of this requires a little bookkeeping but in return, to read 1k characters, we only need a single call to fread(), and then a bunch of array accesses, not 1k calls to fgetc.

So in simplehuffman-2.c, I did precisely that for the reading routines.

Prof2

As expected, fgetc has basically dropped off the profile.

What does this mean for actual performance? Well, the user time to encode a 10MB file dropped by 40%. Not bad for an additional 10 lines of code!

Now let’s see what happens when we decode the same file.

Prof1 dec

Hmm… ReadBit is quite a bottleneck. What ReadBit does is to basically extract bits off from a char and return them. Once it has exhausted those 8 bits, it reads another character from the disk and continues.

The way that the bits are extracted is divide and mod. This is convenient, but in practice a bit naive, since the division will only every give back 0 or 1.

nextbit = bits / bitcount; bits %= bitcount; bitcount /= 2; return nextbit;

Another approach is to use the bitwise operators. We only care about that one bit, so we can & it with a mask, and return it.

nextbit = bits & bitcount; bits -= nextbit; bitcount >>= 1; return (nextbit != 0);

Admittedly, it doesn’t look much more exciting. But here’s the new profile:

Prof2 dec

ReadBit using dropped dramatically. Overall, runtime dropped more than 50% decoding a 10MB file as a result of swapping those 4 lines of code.

Here is the modified version of the program: simplehuffman2.c

Left 404

Adrift in the 21st Century

Profiling and optimizing Huffman