Following up on my previous post, I’ve posted right below an actual working example of a simple C leak detector. It pretty much follows the outline given in that post. There is a global table to keep track of allocations and deallocations. Anything left in the table at the end of the program is known to have been leaked. By recording filenames and line numbers as well as memory addresses, the table makes it possible to know exactly where leaked allocations were made.
Leaker code. ~180 lines of ANSI C.
The nice thing about this approach is that it requires only the most minimal modification of user source code to be invoked. Add #include "leaker.h" to the top of your source files, and you’re set to go.
Beyond just leaks, the lookup table can catch errors in deallocation and reallocation. For example, free()
ing the same address twice or attempting to realloc()
a pointer that was not allocated in the first place is easily caught because the custom functions overriding free() and realloc()
always attempt to first look up the pointers they are given in the table.
The only real downside to leaker is that the implementation shown here uses a simple array to store entries. For small number of allocations, this is fine, but for big programs, finding addresses in the table for removal or updating will become painfully slow. This can be addressed reasonably easily by using a hash table structure rather than an array to store allocation information. Unlike an array, a well-maintained hash-table suffers only minimal performance loss as the number of entries grows.
There are one type of nasty errors that leaker does not attempt to address: reads and writes to invalid addresses. Unfortunately, this is a much more complicated problem, as we can’t use any preprocessor magic to trap read and write operations. Unlike allocations and deallocations in C which are managed by a small handful of functions, read and write operations basically happen everywhere. Moreover, even if we could somehow trap all such operations, the performance hit of verifying that every read and write was to an allocated block of memory would be quite high. Tools like valgrind do this checking by essentially running the program in a small emulator.
There is one relatively common type of invalid access that can be detected reasonably easily: writing off the end of an array. This is the case where you allocate n elements in your array, but then attempt to write to n+1 or more elements. Here, the trick is to allocate a little extra space at the end of each array, and set that part to some known value. When the block of memory gets deallocated, you can check to see that the values in the extra part have not changed. The downside is you won’t know exactly where the incorrect write occurred, but you will at least know what line it occurred before.
(This can be surprisingly important. One student I was helping was allocating a large number of structures holding strings. His program was mysteriously crashing near the end when he deallocated those structures. After considerable headscratching, we discovered that in the early part of the program, he was accidentally writing 1 character too many to every string he was allocating. When we fixed that, everything worked fine. It took us a while to find though, because the deallocation that was crashing the program was not the one that dealt with the strings. Moral of the story – corrupting heap memory can have strange and unhelpful side-effects!)