tbp wrote:

On 4/29/05, Uros Bizjak <[EMAIL PROTECTED]> wrote:


Hello Scott!


Hello Scott & Uros,



Specifically, the -funsafe-math-optimizations flag doesn't work
correctly on AMD64 because the default on that platform is
-mfpmath=sse. Without specifying -mfpmath=387,
-funsafe-math-optimizations does not generate inline processor
instructions for most floating-point functions.


[snip]


It was found that moving data from SSE registers to X87 registers (and
back) only to call an x87 builtin degrades performance. Because of this,
x87 builtins are disabled for -mfpmath=sse and a normal libcall is
issued for sin(), etc functions. If someone wants to use x87 builtins,
then _all_ math operations should be done in x87 registers to avoid
costly SSE->x87 moves.



Shameless plug with my own performance analysis regarding SSE on x86-64. I've ported my coherent raytracer which mostly uses intrinsics in the hot path (and no transcendentals). While gcc4.x compiled binaries are ~5% slower than those compiled with icc8.1 on ia32 (best case), it's the other way around on x86-64 if not more (on my opteron with icc8.1 and beta 9.0). Obviously there's much less pressure on the (cough weak cough) register allocator and in the end the generated code is way leaner.

My only gripe with fast-math is that it's the only way to enable some
optimizations while making NaNs verbotten; couple that with the lack
of cross unit IPO and you're stuck with a kind of nasty "global"
switch (unless you have room for some function calls).


Granted, POV-Ray may not be state-of-the-art, but then, I know quite a few people who say that (even legitimately) about just about every software product in existence.

If you have a suggestion for better benchmarks, I'm listening. Is your ray tracer available?

..Scott

Reply via email to