------- Comment #52 from whaley at cs dot utsa dot edu  2006-08-09 14:33 -------
Paolo,

>In some sense, this is the peephole I would rather *not* do.  But the answer 
>is yes. :-)

Ahh, got it :)

>So, do you now agree that the bug would be fixed if the patch that is in GCC 
>4.2 was backported to GCC 4.1 (so that your users can use that)?

Well, much as I might like to deny it, yes I must agree bug is fixed :)  I
think there might still be more performance to get, and initial timings show
that 4 may be slower than 3 on some systems.  However, it will also clearly be
faster than 3 on some (so far, most) systems, and so far, is competitive
everwhere, so not even I can call that a performance bug :)

And yes, getting it into the next gcc release would be very helpful for ATLAS.

>And do you still see the abysmal x87 single-precision FP performance?

No, the problems were the same for both precisions.  I haven't retimed all the
systems, but here's the numbers I do have for the benchmark:

                              DOUBLE            SINGLE
              PEAK        gcc3/gccS/gcc4    gcc3/gccS/gcc4
              ====        ==============    ==============
Pentium-D :   2800        2359/2417/2067    2685/2684/2362
Ath64-X2  :   5600        3681/4011/2102    3716/4256/2207
Opteron   :   3200        2590/2517/1507    2625/2800/1580
P4E       :   2800        1767/1754/1480    1914/1954/1609
PentiumIII:    500        239/238/225       407/393/283

As you can see, on the benchmark, the single precision numbers are better than
the double now.  I cannot get single precision to run at quite the impressive
93% of peak as double when exercising the code generator on the Ath64-X2, but
it gets a respectable 85% of peak (at these levels of performance, it takes
only very minor differences to drop from 93 to 85, so that's not that
unexpected: I am still investigating this).

Thanks for all the help,
Clint


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827

Reply via email to