tail from coreutils was compiled using the GCC. Use the same compiler backend (GDC). Any difference that still remains is from runtime initialization/termination, the overhead of using a wrapper class around the C routines (for the memory mapping or printing to terminal) and possibly a faster scan routine for line endings in the Linux tool. Implementations using SSE to scan for \n in a block of 16 bytes are possible.
-- Marco
