Most programs don’t do that much disk I/O. But for those that do, the disk I/O is often the bottleneck. I discovered this firsthand working on my Huffman tree encoding program. On small data files, it doesn’t matter much how you read or write your data, but when the filesizes get into hundreds of megabytes, implementation makes a big difference.
The 3 basic C functions for reading (and writing) unformatted text files are fgetc/fputc, fgets/fputs, and fread/fwrite
. fgetc
reads in one character at a time, fgets
one line at a time and fread
some number of ‘records’ at a time. As a record is just a sequence of bytes of a defined length, what fread
really let you do is read in data in chunks whose size are recordsize * number of records. The respective put/write functions do the reverse.
To get a handle on the differences in terms of performance, I wrote a small program to read in and write back out a 150MB file using all 3 methods. On a relatively old Linux system (1.6GHZ, 160GB ATA/66 drive), here’s what I found. The chunk size is just the number of bytes in data read in or written out in a single operation.
Method | Chunk size of data | Time |
fgetc/fputc | 1 byte | 5.90 |
fgets/fputs | 64 bytes | 1.71 |
fread/fwrite | 1 byte | 18.37 |
fread/fwrite | 4 byte | 5.22 |
fread/fwrite | 16 byte | 1.88 |
fread/fwrite | 64 byte | 1.06 |
fread/fwrite | 256 byte | 0.79 |
fread/fwrite | 1024 byte | 0.75 |
fread/fwrite | 4096 byte | 0.71 |
fread/fwrite | 16384 byte | 0.64 |
fread/fwrite | 65536 byte | 0.63 |
fread/fwrite | 262144 byte | 0.66 |
The bottom line is that if you can accommodate reading large chunks of data at a time, you’ll get substantial savings. The difference between operating on 16 kilobyte chunks with fread/fwrite
and 1 byte chunks with fgetc/fputc
is ~9x. fread/fwrite
in particular lets you read in chunks of your choosing, and the sweet spot seems to be about around 16kb. Interestingly, fread/fwrite
is actually slower than fgetc/fputc
for very small chunk sizes.
On the whole, fgets/fputs
with a medium-sized buffer does pretty well. Still, if speed is the primary consideration, fread/fwrite
is still a win, unless you can guarantee that your files have very long lines in them (fgets
stops reading at the newline character).
Incidentally, the speedups in the table are about in line with what I observed migrating my Huffman program from single-character I/O to chunk-based I/O.