pi_css5: Compute π FAST

pi_css5 written by Takuya Ooura and is copyrighted by him, but may be freely distributed and redistributed. In addition to calculating lots of digits of π fast (although other programs are faster), it is extremely portable, and thus an interesting benchmark for different compilers and hardware. Or even languages since I’ve now converted it into C# and Java.


5/20/06 Updated MacOS X and Linux binaries (for Intel). Some 10% faster now, courtesy of Intel’s compiler. The Linux one is also faster, albeit by a somewhat smaller (but repeatable) 2%.

5/1/06 Updated the Windows binaries with an SSE2 version as well. That one’s about 20% faster than the generic one (but requires Pentium 4/Athlon XP or newer).

3/18/06 All the binaries have been updated. Most should be fast, and slightly more user friendly (don’t have to run them from the terminal). In particular the MacOSX version runs on any Mac running MacOSX 10.1 or newer, and is optimized for G4, G5 and Apple’s new Intel machines! I’ve no access to Windows right now, so there’s no optimized Windows version (the generic one will work fine). For Linux on x86 machines with SSE2 (read: Pentium 4/Athlon XP or newer), things should be a lot faster.


Performance of pi_css5 on various machines is at the performance page.


Microsoft Windows: Requires Windows 95 or later. Includes a generic version and one optimized for SSE2. pi_css5_windows.zip
Apple MacOS X: Requires MacOSX 10.2 or later. Optimized for PowerPC G4, G5 and Intel based Macs. pi_css5_darwin.tgz

Apple MacOS: Requires MacOS 7.5 or later. Carbon version is for MacOS 8.6 or later. FAT versions supports 68K Macs but requires a floating point unit. The programs must be copied out of the disk image before they can be used. For some reason the FAT version can’t do more than 2 million digits. pi_css5_fatmacos.smi.bin

Linux on x86: Requires glibc 2.1 or
newer. SSE2 version requires a Pentium 4/Athlon XP or newer processor. pi_css5_x86linux.tgz

Linux on PowerPC: Requires glibc 2.1 or newer. pi_css5_ppclinux.tgz

Linux on Alpha: Requires glibc 2.1 or newer. pi_css5_alphalinux.tgz

Linux on Itanium: Requires glibc 2.1 or newer. pi_css5_ia64linux.tgz

HP-UX: Requires PA1.1 processor or higher and HP-UX 11.11 or newer. Include PA 2.0 and IA64 optimized versions. pi_css5_hpux.tgz
Sun Solaris: requires UltraSPARC processor and Solaris 2.8 or newer. pi_css5_sparcsolaris.tgz
HP Tru64: requires Tru64 5.1B or newer. Should run on any Alpha, but tested only on an EV67. pi_css5_tru64.tgz
BeOS/ZetaOS/Haiku: requires BeOS 5.0 or later, running on an Intel processor (no PPC). pi_css5_x86beos.zip
Source Code
C: requires an ANSI C compiler. You’ll want to edit the makefile for your system, particular the variables CC and CFLAGS. pi_css5_src.tgz (~22k)
Java: requires JDK 1.2 or better to compile, and JRE 1.4 or compatible to run the included bytecode. This code is a manual translation from the C code. pi_cs5_java.tgz (~53k)
C#.NET: can be compiled with any C# compiler. Should be compatible with any runtime. This code is a manual translation from the Java code. pi_css5_csharp.tgz (~28k)

Compilation Notes

Microsoft Windows: pi_css5_sse2 was bult with Intel C++ 9.0 using -xN optimizations. Generic version was built with mingw32 and GCC 4.1, using -O3 -funroll-loops -fomit-frame-pointer -mcpu=i686.
Apple MacOS X: Universal binary. Intel version built with Intel C++ 9.1 using -fast optimizations. PowerPC version built with GCC 4.0.1, using -O3 -funroll-loops -fomit-frame-pointer -ffast-math -fprefetch-loop-array optimizations for all targets. Additionally the G4 version uses -mcpu=G4 -maltivec -faltivec and the G5 version uses -mcpu=G5 -fast.

Apple MacOS (Classic): pi_css5 (FAT) was built with MrC 5.0 and Symantec C 8.9 using MPW 3.5 and all optimizations enabled. pi_css5 (Carbon) was built with Metrowerks Codewarrior Pro 8.2 and level 4 optimizations.

Linux on x86: pi_css5.sse2 was built with Intel C++ 9.0 using -xN optimizations, and so requires a machine with sse2 support (Pentium 4, Athlon XP or newer). Generic version built with GCC 4.2 using the -O3 -funroll-loops -ffast-math -fomit-frame-pointer -mcpu=i686 and -static flags. Both built with dietlibc to reduce size.

Linux on PowerPC: Built with GCC 3.4.4 using the -O3 -funroll-loops -fomit-frame-pointer -mcpu=G3 and -static flags.

Linux on Alpha: Built with GCC 3.4.3 using the -O3 -funroll-loops -fomit-frame-pointer -mcpu=ev67 and -static flags.

Linux on IA64: Built with Intel C++ 9.0 using the -O2 -static flags.

HP-UX: Built with HP aC++ A.03.055 using -fast +Odataprefetch and -Wl,-aarchive and +DA1.1 or +DA2.0 for the PA versions.
Sun Solaris: Built with Sun C 5.7 using -xO5 -fast -Bstatic.
Tru64 HP Tru64: Compiled using Compaq C++ (unknown version) and the flags -fast -tune ev67 -non_shared.
BeOS/ZetaOS/Haiku: Built with the gcc 3.4.3 toolchain for BeOS, using the -O3 -funroll-loops -fomit-frame-pointer -mtune=pentium4 flags.