Hi, Uros. > Due to outdated i386 ABI, where all FP parameters are > passed on stack, SSE code does not show all its power when > used. When math library function is called, SSE regs are > pushed on stack and called math library function (that is > currently implemented again with i387 insns) pulls these > values from stack to > x87 registers. In contrast, x86_64 ABI specifies that FP > values are passed in SSE registers, so they avoid costly SSE > reg->stack moves. Until i386 ABI (together with supporting > math functions) is changed to something similar to x86_64, > use of -mfpmath=sse won't show all its power.
Actually, in many cases, SSE did help x86 performance as well. That happens in FP-intensive applications which spend a lot of time in loops when the XMM register set can be used more efficiently than the x87 stack. > Another fact > is, that x87 intrinsics are currently disabled for > -mfpmath=sse, because it was shown that SSE math libraries > (with SSE ABI) are faster for x86_64. That's because the x87 microcode operates in 80-bit precision, whereas the SSE routines in just 32 or 64-bit precision. Yet, their precision is better over their domains. > These functions > interfear with gcc's builtins, so -D__NO_MATH_INLINES is > needed to fix this problem. The situation is even worse when > SSE code is used. Asm inlines from mathinline.h are > implemented using i387 instructions, so these instructions > force parameters to move from SSE registers to x87 regs (via > stack) and the result to move back to SSE reg the same way. > This can be seen when sin(a + 1.0) is compiled with math.h > header included. With -mfpmath=sse, SSE->mem->FP reg moves > are needed to satisfy constraints of inlined sin() code. Yep, that's a killer especially in x86_64, which defaults to -mfpmath=sse... Evandro