Optimizing Huffman, part 2

In spite of the improvements made to the Huffman program’s use of I/O functions in the previous post, it’s fairly clear that more can be done.  In particular, looking at the profiles for encoding and decoding, the fgetc/fputc pair still account for a significant amount of the program’s runtime.  While we succeeded in getting rid of them for reading and writing normal (unencoded) data, we still used them for the encoded data.  It turns out that this isn’t too hard to fix either.

simplehuffman3.c implements this idea.

Continue reading

Profiling and optimizing Huffman

The Huffman example I posted last month has the virtue of being quite simple.  On the flip-side, it has some room for improvement, particularly when it comes to performance.

Of course no less a computer scientist than Don Knuth once said that “Premature Optimization is the root of all evil.”  So it makes sense to look for the most obvious bottlenecks first.

simplehuffman2.c is the optimized version.  A description of the approach follows.

Continue reading

A simple Huffman implementation

Huffman encoding uses some interesting insights about data to achieve compression.  It’s reasonably simple, and exposes some interesting questions about data representation and optimization.

simplehuffman.c – (300 lines of C) is a program I wrote in an afternoon to do Huffman encoding and decoding of files.  It will work on any type of file and can compress and decompress a 15MB text file in under 2 seconds on current hardware.

Continue reading

Lightroom oddity

Lightroom CPU usageAdobe Lightroom is pretty much the de-facto standard for DAM (digital asset management) and bulk editing among professional and serious amateur digital photographers. As a heavy user since the beta of version 1 in mid-2006 (we are currently at version 3.4), I can appreciate why that is. Lightroom offers a reasonably intuitive well-designed interface for editing and organizing large numbers of images. It has a number of minor flaws, but it compares favorably to pretty much every competing software package I’ve tried.

That said, from an architecture point of view, it seems that Lightroom could stand some improvement. My basic complaint is that the software is sluggish. Not tear-your-hair out slow mind you, but lethargic enough to be annoying. And this is when dealing with 12MP RAW image files, on a machine that has 4 2.8GHZ cores and 6GB RAM. That is to say image sizes aren’t that large, and the hardware is pretty current.

Continue reading

Leaker: a simple C leak detection example

Following up on my previous post, I’ve posted right below an actual working example of a simple C leak detector. It pretty much follows the outline given in that post. There is a global table to keep track of allocations and deallocations. Anything left in the table at the end of the program is known to have been leaked. By recording filenames and line numbers as well as memory addresses, the table makes it possible to know exactly where leaked allocations were made.

Leaker code. ~180 lines of ANSI C.

Continue reading

Disk I/O in C – avoid fgetc/fputc

Most programs don’t do that much disk I/O. But for those that do, the disk I/O is often the bottleneck. I discovered this firsthand working on my Huffman tree encoding program. On small data files, it doesn’t matter much how you read or write your data, but when the filesizes get into hundreds of megabytes, implementation makes a big difference.

The 3 basic C functions for reading (and writing) unformatted text files are fgetc/fputc, fgets/fputs, and fread/fwrite. fgetc reads in one character at a time, fgets one line at a time and fread some number of ‘records’ at a time. As a record is just a sequence of bytes of a defined length, what fread really let you do is read in data in chunks whose size are recordsize * number of records. The respective put/write functions do the reverse.

Continue reading

Detecting Memory Leaks in C

Memory leaks are a common occurrence in C programs, particularly for novice programmers.  Because memory allocation and deallocation is left to the programmers, it’s not difficult to lose track of what was allocated where, and forget to deallocate it once it is no longer in use.  The is particularly true for more complex data structures which may have dynamically allocated blocks of memory that reference other dynamically allocated blocks of memory.

There are a fair number of tools to deal with this, like valgrind.  Still, building a simple detector is a relatively straightforward exercise, and can often reveal interesting things.

Continue reading

Programmer vs. Computer Time

I’ve always been predominantly a C programmer, mainly simply from habit. However, recent projects using Perl have me thinking about the tradeoffs involved. The big issue seems to be time – both at the programmer’s end and at the computer’s.

C compared to languages like Java or Perl generally offers greater flexibility and performance. On the flip side, it makes programming tasks more complex and debugging more difficult. C requires more time from the programmer, in order to minimize time spent running on the computer. Higher-level languages on the other hand, require less from the programmer, at the expense of longer run-time.

Continue reading

C – not a good starting language

In most computer science programs, the first course in the sequence is a basic programming course. Usually, this course introduces the standard concepts, along with one of the more common programming language. And at least from what I’ve seen, that language is usually C.

From what I’ve seen in the classes I’ve taken, and particularly from what I’ve seen tutoring at the college this quarter, this is a mistake. Don’t get me wrong, C is a great programming language. I learned on it. I like it. I use it regularly. But as a first programming language for beginning students it leaves a lot to be desired.

Continue reading

Finding Memory Errors in C

Memory errors are among the nastier programming mistake one can make when using C. Unlike syntax errors, the compiler doesn’t flag them for you. Unlike basic logic errors, you won’t necessarily see the problem in the output. The result – silent data corruption – usually manifests itself differently under different systems and compilers which makes it especially obnoxious to deal with.

Fortunately, there are a couple of good tools for debugging memory errors. My favorite is valgrind, a memory debugger modeled after Rational’s purify (the commercial standard) that runs on Linux and now (experimentally) on MacOS X.

Continue reading