tbp wrote:
Shameless plug with my own performance analysis regarding SSE on x86-64.You might want to a look at my just-published review of GCC 4.0, where I compare it's performance on some well-known applications, including LAME and POV-Ray, on Pentium 4 and Opteron. In terms of POV-Ray, 4.0 produced a smaller executable that was slightly slower than did 3.4.3. You can find the full review at:
I've ported my coherent raytracer which mostly uses intrinsics in the
hot path (and no transcendentals).
While gcc4.x compiled binaries are ~5% slower than those compiled with
icc8.1 on ia32 (best case), it's the other way around on x86-64 if not
more (on my opteron with icc8.1 and beta 9.0).
Obviously there's much less pressure on the (cough weak cough)
register allocator and in the end the generated code is way leaner.
http://www.coyotegulch.com/reviews/gcc4/index.html
..Scott