Optimizing Huffman, part 2

Posted on 6 August, 2011 by dara

In spite of the improvements made to the Huffman program’s use of I/O functions in the previous post, it’s fairly clear that more can be done. In particular, looking at the profiles for encoding and decoding, the fgetc/fputc pair still account for a significant amount of the program’s runtime. While we succeeded in getting rid of them for reading and writing normal (unencoded) data, we still used them for the encoded data. It turns out that this isn’t too hard to fix either.

simplehuffman3.c implements this idea.

Continue reading →

Profiling and optimizing Huffman

Posted on 3 August, 2011 by dara

The Huffman example I posted last month has the virtue of being quite simple. On the flip-side, it has some room for improvement, particularly when it comes to performance.

Of course no less a computer scientist than Don Knuth once said that “Premature Optimization is the root of all evil.” So it makes sense to look for the most obvious bottlenecks first.

simplehuffman2.c is the optimized version. A description of the approach follows.

Continue reading →

A simple Huffman implementation

Posted on 25 July, 2011 by dara

Huffman encoding uses some interesting insights about data to achieve compression. It’s reasonably simple, and exposes some interesting questions about data representation and optimization.

simplehuffman.c – (300 lines of C) is a program I wrote in an afternoon to do Huffman encoding and decoding of files. It will work on any type of file and can compress and decompress a 15MB text file in under 2 seconds on current hardware.

Continue reading →

Lightroom oddity

Posted on 20 March, 2011 by dara

Lightroom CPU usage Adobe Lightroom is pretty much the de-facto standard for DAM (digital asset management) and bulk editing among professional and serious amateur digital photographers. As a heavy user since the beta of version 1 in mid-2006 (we are currently at version 3.4), I can appreciate why that is. Lightroom offers a reasonably intuitive well-designed interface for editing and organizing large numbers of images. It has a number of minor flaws, but it compares favorably to pretty much every competing software package I’ve tried.

That said, from an architecture point of view, it seems that Lightroom could stand some improvement. My basic complaint is that the software is sluggish. Not tear-your-hair out slow mind you, but lethargic enough to be annoying. And this is when dealing with 12MP RAW image files, on a machine that has 4 2.8GHZ cores and 6GB RAM. That is to say image sizes aren’t that large, and the hardware is pretty current.

Continue reading →

Leaker: a simple C leak detection example

Posted on 19 March, 2011 by dara

Following up on my previous post, I’ve posted right below an actual working example of a simple C leak detector. It pretty much follows the outline given in that post. There is a global table to keep track of allocations and deallocations. Anything left in the table at the end of the program is known to have been leaked. By recording filenames and line numbers as well as memory addresses, the table makes it possible to know exactly where leaked allocations were made.

Leaker code. ~180 lines of ANSI C.

Continue reading →

Disk I/O in C – avoid fgetc/fputc

Posted on 17 March, 2011 by dara

Most programs don’t do that much disk I/O. But for those that do, the disk I/O is often the bottleneck. I discovered this firsthand working on my Huffman tree encoding program. On small data files, it doesn’t matter much how you read or write your data, but when the filesizes get into hundreds of megabytes, implementation makes a big difference.

The 3 basic C functions for reading (and writing) unformatted text files are fgetc/fputc, fgets/fputs, and fread/fwrite. fgetc reads in one character at a time, fgets one line at a time and fread some number of ‘records’ at a time. As a record is just a sequence of bytes of a defined length, what fread really let you do is read in data in chunks whose size are recordsize * number of records. The respective put/write functions do the reverse.

Continue reading →

Detecting Memory Leaks in C

Posted on 16 March, 2011 by dara

Memory leaks are a common occurrence in C programs, particularly for novice programmers. Because memory allocation and deallocation is left to the programmers, it’s not difficult to lose track of what was allocated where, and forget to deallocate it once it is no longer in use. The is particularly true for more complex data structures which may have dynamically allocated blocks of memory that reference other dynamically allocated blocks of memory.

There are a fair number of tools to deal with this, like valgrind. Still, building a simple detector is a relatively straightforward exercise, and can often reveal interesting things.

Continue reading →

Programmer vs. Computer Time

Posted on 13 March, 2011 by dara

I’ve always been predominantly a C programmer, mainly simply from habit. However, recent projects using Perl have me thinking about the tradeoffs involved. The big issue seems to be time – both at the programmer’s end and at the computer’s.

C compared to languages like Java or Perl generally offers greater flexibility and performance. On the flip side, it makes programming tasks more complex and debugging more difficult. C requires more time from the programmer, in order to minimize time spent running on the computer. Higher-level languages on the other hand, require less from the programmer, at the expense of longer run-time.

Continue reading →

C – not a good starting language

Posted on 11 March, 2011 by dara

In most computer science programs, the first course in the sequence is a basic programming course. Usually, this course introduces the standard concepts, along with one of the more common programming language. And at least from what I’ve seen, that language is usually C.

From what I’ve seen in the classes I’ve taken, and particularly from what I’ve seen tutoring at the college this quarter, this is a mistake. Don’t get me wrong, C is a great programming language. I learned on it. I like it. I use it regularly. But as a first programming language for beginning students it leaves a lot to be desired.

Continue reading →

Finding Memory Errors in C

Posted on 9 March, 2011 by dara

Memory errors are among the nastier programming mistake one can make when using C. Unlike syntax errors, the compiler doesn’t flag them for you. Unlike basic logic errors, you won’t necessarily see the problem in the output. The result – silent data corruption – usually manifests itself differently under different systems and compilers which makes it especially obnoxious to deal with.

Fortunately, there are a couple of good tools for debugging memory errors. My favorite is valgrind, a memory debugger modeled after Rational’s purify (the commercial standard) that runs on Linux and now (experimentally) on MacOS X.

Continue reading →

Left 404

Adrift in the 21st Century

Category Archives: Programming

Optimizing Huffman, part 2

Profiling and optimizing Huffman

A simple Huffman implementation

Lightroom oddity

Leaker: a simple C leak detection example

Disk I/O in C – avoid fgetc/fputc

Detecting Memory Leaks in C

Programmer vs. Computer Time

C – not a good starting language

Finding Memory Errors in C