simplehuffman is a simple Huffman encoding/decoding program written in C. While it was written primarily as an exercise in clean coding and optimization, it can indeed encode and decode files, and for large ASCII text files it achieves compression of typically around 35%.
The original simplehuffman.c code was written to be easy to read, with no attention paid to optimization opportunities.
simplehuffman2.c made use of the profiler to find the program’s hot spots, and based on those findings made a few changes to improve I/O and decoding performance.
simplehuffman3.c continued the process, eliminating byte-by-byte I/O in favor of buffering for all I/O.
simplehuffman4.c changed the representation used for bitstrings to substantially improve encoding performance (again).
I benchmarked the different versions by encoding and decoding the same 75MB tar/gz file. The net improvement between the first and fourth version of the program is around 3.5x, both for encoding and decoding. The final version is encoding at 35MB/s+ and decoding at 11MB/s+.
Version | Encode (time) | Speedup | Decode (time) | Speedup |
1 | 7.022 | 100% | 23.026 | 100% |
2 | 4.242 | 165.5% | 10.041 | 229.3% |
3 | 4.131 | 170.0% | 7.764 | 296.6% |
4 | 2.060 | 340.9% | 6.241 | 368.9% |
Executables were built with gcc 4.6.1 using -O3 optimizations, and run on a 1.6GHZ AMD Sempron processor.