Re: GCC 4.0, Fast Math, and Acovea

Scott Robert Ladd Tue, 03 May 2005 13:49:16 -0700

tbp wrote:

On 4/29/05, Uros Bizjak <[EMAIL PROTECTED]> wrote:

Hello Scott!
Hello Scott & Uros,
Specifically, the -funsafe-math-optimizations flag doesn't work correctly on AMD64 because the default on that platform is -mfpmath=sse. Without specifying -mfpmath=387, -funsafe-math-optimizations does not generate inline processor instructions for most floating-point functions.

[snip]

It was found that moving data from SSE registers to X87 registers (and back) only to call an x87 builtin degrades performance. Because of this, x87 builtins are disabled for -mfpmath=sse and a normal libcall is issued for sin(), etc functions. If someone wants to use x87 builtins, then _all_ math operations should be done in x87 registers to avoid costly SSE->x87 moves.
Shameless plug with my own performance analysis regarding SSE on x86-64.
I've ported my coherent raytracer which mostly uses intrinsics in the
hot path (and no transcendentals).
While gcc4.x compiled binaries are ~5% slower than those compiled with
icc8.1 on ia32 (best case), it's the other way around on x86-64 if not
more (on my opteron with icc8.1 and beta 9.0).
Obviously there's much less pressure on the (cough weak cough)
register allocator and in the end the generated code is way leaner.
My only gripe with fast-math is that it's the only way to enable some optimizations while making NaNs verbotten; couple that with the lack of cross unit IPO and you're stuck with a kind of nasty "global" switch (unless you have room for some function calls).

Granted, POV-Ray may not be state-of-the-art, but then, I know quite a few people who say that (even legitimately) about just about every software product in existence.

If you have a suggestion for better benchmarks, I'm listening. Is your ray tracer available?

..Scott

Re: GCC 4.0, Fast Math, and Acovea

Reply via email to