Archive for the 'Programming' Category

strlang

strlang – a simple language for string manipulation

strlang is a programming language I created with the goal of making string manipulation simple and straightforward.  It is an imperative language with a minimalist syntax.  The language and its compiler were written as part of the Programming Languages and Translators (COMS 4115) course at Columbia University in Fall 2011.

Features

  1. Basic data types are strings, numbers and maps (sets of key-value pairs)
  2. Full-set of operators for arithmetic, string manipulation (including basic regular expressions) and map construction
  3. C-like structure including functions, loops, conditionals and expressions
  4. No keywords

 

Continue reading ‘strlang’

Bytecode v. Native Code – calculating pi

Prologue

Like many folks who began using Java around ten years ago, I found it very useful with two exceptions: it forced you to use objects for everything, and it was slow.

Since then, nothing much has changed about the object-oriented nature of Java, although some of the more annoying limitations of the early days have been more or less dealt with.

Continue reading ‘Bytecode v. Native Code – calculating pi’

mhz

While computers have changed a lot in the last 20 years, we still use clock speed (MHZ or GHZ) as the primary metric for describing speed.  Unfortunately, the manufacturers don’t make it easy.  First AMD and then Intel switched away from labeling their processors by clock.  Thus if you purchase a new machine today, the processor is likely to be a ‘Core i3 2100′ or a ‘Phenom X4 2200′.  The numbers that they use after the processor type aren’t the clock speed, rather, they’re some sort of internally-designated model number.

Continue reading ‘mhz’

MacOS X 10.7 compilers

Compile window

As part of a little project to make detecting memory/pointer errors easier for beginning C/C++ programmers, I’ve installed a number of different compilers on my system.  I wanted to make sure that my approach was widely applicable.

At this point, there are 4 (3.5 really) major C/C++ compilers available for MacOS X 10.7.  What follows is a brief description of each, and some background as to how we got here.

Continue reading ‘MacOS X 10.7 compilers’

Building gcc on MacOS X

Apple has never included a stock version of GCC with their development tools, but now with the current version of Xcode, they don’t even include a modified version.  Seeing as their plan going forward is to move entirely to Clang/LLVM, if you intend to use GCC on OS X, you’ll have to build it yourself.  It’s not a terribly difficult process, but it can be a bit tricky the first few times around, particularly when it comes to configuring the build.

Instructions follow.

Continue reading ‘Building gcc on MacOS X’

Notes on Building a C/C++ Leak Detector

Building a simple C memory leak detector is not too difficult.  I did it in a previous post in less than 200 lines.  But once you add features, it becomes a lot more complicated.

What features am I talking about?

1) Allow use in programs with more than 1 source file.
2) Make fast enough to use in reasonably large programs.
3) C++ support (track allocations made with new and delete).
4) Detect use of mismatched allocation/deallocation (e.g. new and free).
5) Detect simple buffer overflows (writing off the end of an array). 
6) Be generally helpful (errors should be handled gracefully). 
7) Test everything well.

And of course it goes without saying that I wanted to keep it portable, and as simple as possible.

Continue reading ‘Notes on Building a C/C++ Leak Detector’

Optimizing Huffman, part 3

So far, we’ve found a number of fairly low-cost (in terms of code and complexity) ways to tune the Huffman program for better performance.

Still, after thinking about the problem for a bit, I did see some other potential areas for improvement.  In particular, the fact that the encoding process was being done essentially bit-by-bit (appending 1 bit at a time to the buffer) seemed inefficient.  If there was only some way to append a whole bunch of bits at once (i.e. the entire encoding for a given character)…

simplehuffman4.c does precisely this.

Continue reading ‘Optimizing Huffman, part 3′

Optimizing Huffman, part 2

In spite of the improvements made to the Huffman program’s use of I/O functions in the previous post, it’s fairly clear that more can be done.  In particular, looking at the profiles for encoding and decoding, the fgetc/fputc pair still account for a significant amount of the program’s runtime.  While we succeeded in getting rid of them for reading and writing normal (unencoded) data, we still used them for the encoded data.  It turns out that this isn’t too hard to fix either.

simplehuffman3.c implements this idea.

Continue reading ‘Optimizing Huffman, part 2′

Profiling and optimizing Huffman

The Huffman example I posted last month has the virtue of being quite simple.  On the flip-side, it has some room for improvement, particularly when it comes to performance.

Of course no less a computer scientist than Don Knuth once said that “Premature Optimization is the root of all evil.”  So it makes sense to look for the most obvious bottlenecks first.

simplehuffman2.c is the optimized version.  A description of the approach follows.

Continue reading ‘Profiling and optimizing Huffman’

A simple Huffman implementation

Huffman encoding uses some interesting insights about data to achieve compression.  It’s reasonably simple, and exposes some interesting questions about data representation and optimization.

simplehuffman.c – (300 lines of C) is a program I wrote in an afternoon to do Huffman encoding and decoding of files.  It will work on any type of file and can compress and decompress a 15MB text file in under 2 seconds on current hardware.

Continue reading ‘A simple Huffman implementation’