The short answer is that in my implementation, the 64-bit assembly version ran modestly faster on my machine (64-bit Intel Core i7) than the C code compiled in 64-bit mode with full optimizations. Note that this program simply spits out the total number of primes between 2 and MAX, not the actually primes themselves (often a rather long list).
x86 assembly– written for nasm, should run on Linux and MacOS X
C code – should run anywhere
Notes on the algorithm
The basic trial division algorithm to check if an integer n is prime is to divide it by all possible divisors, from 2 to n-1. If any of those divisions has a remainder of 0, we know n is not a prime. If none of them have remainder zero, n is by definition a prime number.
One rather obvious optimization is to note that if m is the square root of n and n has a divisor greater than m, then n must also have a divisor that is less than m. So if we’ve tried all possible divisors up to and include m, and none is a divisor of n, then n is prime. Another case we can rule out is that if n is not 2 and 2 is not a divisor of n, then no even number is a divisor of n. So after testing 2, we only need to test odd numbers as possible divisors for n.
Another basic property that is helpful is that all non-prime numbers can be expressed as a product of prime numbers, in other words if n is not a prime, it must have prime divisors. And since we already know that one of those divisors must be less than or equal to the square root of n, in reality, we only need to check prime numbers between 2 and sqrt(n) as possible divisors for n.
In practice, it works well to just create a table of known primes as we find them, and use the numbers in that table as possible divisors to test. Note that if we test sequentially (2, 3, 5, 7, 9, …) our table is guaranteed not to have any gaps. For example, if the table contains 2, 3 and 5 and we are testing 31, we try dividing by 2, 3, 5 and then since sqrt(83) > 5, 5+2 = 7 then 7+2 = 9 and 9+2 = 11.