MacOS X and Distcc

Building a MacOS X cross-compiler for use with distcc.

With thanks to Benjamin Reed and xwt.org for their work and hints on the topic.

The Problem

You’re trying to set up some open source software. Unfortunately, being open source, and you not running Linux on x86, there are no precompiled binaries. So time to dust out the XCode, and let the old machine churn away for a few hours.

Because compilation is typically slow, and on MacOSX it’s doubly (literally) so. Note: with the arrival of Apple’s Intel Macs, compilation speed is much less of a problem.

The Solution

Distcc lets you, with a minimal amount of headache, harness the computing power of any other box running on your local network running some *nix variant. It simply acts as a jobserver for make, handing out compilation tasks to other boxes. In my case, the university has a few nice Xeons with Ubuntu, so suddenly my 667MHZ TiBook can actually do stuff again.

But you do need a cross compiler (and assembler) in addition to distcc for this to work. Since gcc takes quite a while to build, and the amount of work to get a full gcc cross-compiler built is significant, I’ve included instructions here on how to build the minimal tools necessary to take advantage of distcc. This will NOT give you a full cross-compiler, merely the necessary pieces to cross-compile through distcc.

Building and installing the cross compiler (for MacOSX 10.4)

First order of business: get a cross compiler installed on the remote box (or boxes) that’ll be doing the heavy lifting.

0) Decide where you’re going to install the cross tools, and set up the necessary directories. Also, determine if you’ll be doing ppc builds (TARGET=powerpc-apple-darwin8), x86 builds (TARGET=i686-apple-darwin8), or both.

export PREFIX=_BASE_OF_INSTALL_DIRECTORY_
export TARGET=_CHOICE_OF_TARGET_
mkdir -p $PREFIX/bin
mkdir -p $PREFIX/$TARGET/bin
mkdir -p $PREFIX/libexec/gcc/$TARGET/4.0.1

1) Download the appropriate versions of the Opendarwin cctools (odcctools) and Apple’s version of the GNU Compiler Collection (gcc). The gcc version MUST be the same as the one installed on your Mac.

wget www.opendarwin.org/downloads/odcctools-20060413.tar.bz2
wget www.opensource.apple.com/darwinsource/tarballs/other/gcc-5250.tar.gz

2) Build and install odcctools. All we really want is the assembler, so we can shorten things a bit. For the install bit, change ppc to i386 if appropriate.

tar -xjvf odcctools-20060413.tar.bz2
cd odcctools-20060413
./configure --target=$TARGET
make -C libstuff
make -C as

install -m 755 as/ppc/as $PREFIX/$TARGET/bin

3) Build and install gcc. All we need are the compilers themselves, and the compiler driver. In particular we ignore all the libraries that gcc usually builds (libstdc++, libgcc, etc). Also, distcc only supports C, C++, Objective C and Objective C++, so we disable Fortran, Java etc.

tar -xzvf gcc-5250.tar.gz
mkdir objdir
cd objdir
../gcc-5250/configure --target=$TARGET --disable-shared
--disable-nls --enable-languages=c,c++,objc,obj-c++
--with-as=$PREFIX/$TARGET/bin/as
make -k configure-host maybe-all-gcc

At this point, you'll have some error message about missing headers. Ignore it. Make sure that the files gcc/{cc1,cc1obj,cc1plus,cc1plusobj,xgcc,g++} all exist though.

install -m 755 gcc/{cc1,cc1obj,cc1objplus,cc1plus} $PREFIX/libexec/gcc/$TARGET/4.0.1
install -m 755 gcc/xgcc $PREFIX/bin/$TARGET-gcc-4.0.1
install -m 755 gcc/g++ $PREFIX/bin/$TARGET-g++-4.0.1

4) Test your installation.

echo "int main(int argc, char *argv[]) { return 0; } > test.c
$TARGET-gcc-4.0.1 -c test.c

There should be no errors. An error of “Fatal error: invalid listing option `r'” means you didn’t install the assembler properly (ie gcc is trying to use the Linux assembler).

file test.o

Test.o should be Mach-O object.

5) Set things up on the MacOS X side. There are a few wrinkles, and a number of ways to make future builds using distcc easier. Distcc can be invoked with ‘distcc $TARGET-gcc’ (or g++). However, it’s nice to have it picked up automatically. To that end, I suggest creating two shell scripts in /usr/local/bin:

gcc:

#!/bin/sh
distcc _REPLACE_WITH_$TARGET_-gcc-4.0.1 -msse2 "@!"

g++:

#!/bin/sh
distcc _REPLACE_WITH_$TARGET_-g++-4.0.1 -msse2 "@!"

The -msse2 bit is a hack, done because Apple’s gcc always passes this options, but the cross-built one won’t.

Then:

sudo chmod +x /usr/local/bin/{gcc,g++}
sudo ln -s /usr/local/bin/gcc /usr/local/bin/cc
sudo ln -s /usr/local/bin/g++ /usr/local/bin/c++

export PATH=/usr/local/bin:$PATH

6) Deal with some of Apple’s unhelpful defaults: The built-in distcc in 10.4 behaves oddly. It refuses to execute the compiler by its alias ($TARGET-gcc-4.0.1). My solution was to install a fresh copy of distcc in /usr/local/bin. XCode likes to use nonstandard compiler names (it explicitly invokes /usr/bin/gcc-4.0 and /usr/bin/g++-4.0 when building Camino for example). So you may also want to replace these with symlinks to /usr/local/bin/gcc and g++ respectively.

7) At this point, everything’s ready to go. Fire up the distccd daemon on your cross-build boxes, and add their IP addresses to the DISTCC_HOSTS environment variable. Any project that attempts to invoke gcc, g++ or their standard aliases (cc and c++) should take advantage of distcc. You can use ‘make -j number_of_processes’ to make use of the extra computing power.

OS X 10.3 Notes

For a MacOS X 10.3 cross compiler, you need Apple’s gcc build 1671 (assuming you’re using the latest November ’04 Devtools). Unfortunately, to build it you’ll need bison 1.28 (newer versions cause errors when building the Objective C++ parser). On the other hand, you can use the same odcctools which is nice.

After decompressing gcc, you need to patch two things. First, remove the file gcc_os-1671/more-hdrs/stdint.h. Second, edit gcc_os-1671/gcc/gcc.c . Change line 5833 from “const char *v = compiler_version;” to “char *v = compiler_version;“.

Then you can configure, build and install:

./gcc_os-1671/configure --disable-nls --disable-shared
--enable-languages=c,c++,objc,objc++
--target=$TARGET --prefix=$PREFIX --with-as=$PREFIX/$TARGET/bin/as
make all-libiberty
make -C gcc all -k

mkdir -p $PREFIX/lib/gcc/$TARGET/3.3
install -m 755 gcc/{cc1,cc1obj,cc1plus,cc1plusobj} $PREFIX/lib/gcc-lib/$TARGET/3.3
install -m 755 gcc/xgcc $PREFIX/bin/$TARGET-gcc-3.3
install -m 755 gcc/g++ $PREFIX/bin/$TARGET-g++-3.3

On the Mac side, you need to install odcctools as well (the 10.3 linker will choke on the object files created by odcctools on the cross-compiler machine), so something along the lines of: ‘./configure && make && sudo make install’ is necessary.

Conclusions

Aside from the obvious advantages of distcc, a few points should be made. In my remarks, distcc ‘servers’ are the machines doing the cross-compiling while the distcc ‘client’ is the machine parceling out the jobs, in this case the Mac.

MacOS X tends to spend a lot of time in system calls, especially compared to Linux, and this somewhat reduce the advantage of distcc. For instance, a regular build (no distcc) of libxml2 on OS X is 65% slower than on Linux on the same machine (5m30 vs. 3m22). But it uses more than 5X the amount of system time (1m30 vs. 17.7s). The killer here is actually the shell script ‘libtool’. In that type of situation, distcc can only help so much.

Network latency (and to a lesser degree bandwidth) is a crucial issue. If the files being transferred are small, then most of the time spent in distcc will actually be spent sending files from client to server and back again, not compiling. Larger source files (and those that require more work to compile, like C++) spend less time, relatively, in transit, so the advantage is more substantial.

Poorly written makefiles can also slow things down a lot. Some are written in such a way that files are required to be compiled sequentially, which makes it impossible to take advantage of multiple distcc servers.

Distcc is very effective at reducing load on the client. Thus while the compiles may not always be a lot faster, the amount of work the client does is substantially less, and so if you’re trying to, say, do some Photoshop work at the same time, it is a much more pleasant experience when distcc is parceling out the heavy lifting to other machines.

The faster your servers are the better. If they’re slower than your client, distcc is unlikely to make a substantial difference, and can even make things slower.

Caveats

Some programs have poorly constructed makefiles, so ‘make -j2’ (or higher) will not complete successfully.

Distcc can only speed up compilation when the compiling and linking stages are separate. In particular, distcc takes in source, preprocesses it locally and compiles and assembles it remotely (if it can). In other words ‘gcc *.c -o foo’ will not benefit, but ‘gcc -c *.c; gcc *.o -o foo’ will.

Use distcc only on trusted networks or over ssh (although this will also slow performance).

Errors along the lines of Fatal error: invalid listing option `r' are caused by gcc picking up the wrong assembler (the Linux GNU assembler instead of the Darwin GNU assembler). You will need to adjust the PATH environment variable accordingly

YMMV. Comments, bug-reports and suggestions are of course welcome.