I recently had access to an 18(!) core machine, so I naturally ran my favorite benchmark – building the clang 3.2 C/C++ compiler – using between 1 and 36 threads. The build scaled quite well. Going from 1 to 6 threads gave a 5.5x speedup, while going up from 1 to 12 gave a 9.3x speedup. At 18 threads, the speedup was 11.8x. Above 18 threads, there was no speedup. Given that the makefiles don’t seem to have been tailored specifically to many threads, that’s overall pretty good.
Below are the charts, first of time vs. number of threads used by make, then of speedup vs. number of threads used by make.