Hi, Uros.

> Due to outdated i386 ABI, where all FP parameters are 
> passed on stack, SSE code does not show all its power when 
> used. When math library function is called, SSE regs are 
> pushed on stack and called math library function (that is 
> currently implemented again with i387 insns) pulls these 
> values from stack to
> x87 registers. In contrast, x86_64 ABI specifies that FP 
> values are passed in SSE registers, so they avoid costly SSE 
> reg->stack moves. Until i386 ABI (together with supporting 
> math functions) is changed to something similar to x86_64, 
> use of -mfpmath=sse won't show all its power.

Actually, in many cases, SSE did help x86 performance as well.  That happens in 
FP-intensive applications which spend a lot of time in loops when the XMM 
register set can be used more efficiently than the x87 stack.

> Another fact 
> is, that x87 intrinsics are currently disabled for 
> -mfpmath=sse, because it was shown that SSE math libraries 
> (with SSE ABI) are faster for x86_64.

That's because the x87 microcode operates in 80-bit precision, whereas the SSE 
routines in just 32 or 64-bit precision.  Yet, their precision is better over 
their domains.

> These functions 
> interfear with gcc's builtins, so -D__NO_MATH_INLINES is 
> needed to fix this problem. The situation is even worse when 
> SSE code is used. Asm inlines from mathinline.h are 
> implemented using i387 instructions, so these instructions 
> force parameters to move from SSE registers to x87 regs (via 
> stack) and the result to move back to SSE reg the same way. 
> This can be seen when sin(a + 1.0) is compiled with math.h 
> header included. With -mfpmath=sse, SSE->mem->FP reg moves 
> are needed to satisfy constraints of inlined sin() code.

Yep, that's a killer especially in x86_64, which defaults to -mfpmath=sse...

Evandro

Reply via email to