------- Comment #52 from whaley at cs dot utsa dot edu 2006-08-09 14:33 -------
Paolo,
>In some sense, this is the peephole I would rather *not* do. But the answer
>is yes. :-)
Ahh, got it :)
>So, do you now agree that the bug would be fixed if the patch that is in GCC
>4.2 was backported to GCC 4.1 (so that your users can use that)?
Well, much as I might like to deny it, yes I must agree bug is fixed :) I
think there might still be more performance to get, and initial timings show
that 4 may be slower than 3 on some systems. However, it will also clearly be
faster than 3 on some (so far, most) systems, and so far, is competitive
everwhere, so not even I can call that a performance bug :)
And yes, getting it into the next gcc release would be very helpful for ATLAS.
>And do you still see the abysmal x87 single-precision FP performance?
No, the problems were the same for both precisions. I haven't retimed all the
systems, but here's the numbers I do have for the benchmark:
DOUBLE SINGLE
PEAK gcc3/gccS/gcc4 gcc3/gccS/gcc4
==== ============== ==============
Pentium-D : 2800 2359/2417/2067 2685/2684/2362
Ath64-X2 : 5600 3681/4011/2102 3716/4256/2207
Opteron : 3200 2590/2517/1507 2625/2800/1580
P4E : 2800 1767/1754/1480 1914/1954/1609
PentiumIII: 500 239/238/225 407/393/283
As you can see, on the benchmark, the single precision numbers are better than
the double now. I cannot get single precision to run at quite the impressive
93% of peak as double when exercising the code generator on the Ath64-X2, but
it gets a respectable 85% of peak (at these levels of performance, it takes
only very minor differences to drop from 93 to 85, so that's not that
unexpected: I am still investigating this).
Thanks for all the help,
Clint
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827