Nasm is a cross-platform x86 assembler and functions largely the same on MacOS X as it does on Linux. However as I discovered porting some of my code, there are some quirks and slight changes required, compared to working with nasm in Linux.
1) MacOS X uses an object file format called Mach-O. To output mach-o from nasm, use the option ‘-f macho’ rather than ‘-f elf’ like on Linux.
2) Symbols on OS X are automatically prepended with _ (i.e. printf is really _printf). As nasm does not do this by default, you will need to add the option ‘–prefix _’ in almost all cases when assembling.
3) OS X’s compiler and linker default to emitting 64-bit code (at least on 10.6 and later). Accordingly, when linking 32-bit code with gcc, use ‘-m32’.
So building a simple assembly program on OS X means doing something like:
nasm –prefix _ -f macho -o program.o program.asm
gcc -m32 -o program program.o
4) The stack on OS X is 16-byte aligned, and that’s how you need to keep it. Bad things happen when the stack gets misaligned. What this means in practice is that when you enter or exit a function, you may need to do extra work. In particular, keep in mind that the call instruction always adds 4 bytes to the stack pointer.
Consider this example:
SECTION .data
SECTION .text
global main
global putcmain:
push ebp
mov eax, ‘A’
push eax
call putc
…
When we enter at main, the stack is 16-byte aligned, meaning esp % 16 = 0. The first push subtracts 4 from esp, so esp % 16 = 12. The second push does the same thing, making esp % 16 = 8. Finally the call instruction automagically pushes the return address onto the stack, so esp is decrement by 4 again, and esp%16 becomes 4, at which point putc executes.
Thus when putc executes, the stack is no longer 16-byte aligned and this almost invariably results in a segmentation fault.
The fix is not very complicated. We need to subtract 4 more from esp, before the call instruction, so that it will be 16-byte aligned once the call instruction executes. Then after returning to main, we of course need to add 4 to esp as part of our cleanup.
The essential trick is to remember that call always pushes 4-bytes onto the stack, so to get a 16-byte aligned stack once call executes, esp%16 must be equal to 4 before the call instruction.