------- Comment #52 from whaley at cs dot utsa dot edu 2006-08-09 14:33 ------- Paolo,
>In some sense, this is the peephole I would rather *not* do. But the answer >is yes. :-) Ahh, got it :) >So, do you now agree that the bug would be fixed if the patch that is in GCC >4.2 was backported to GCC 4.1 (so that your users can use that)? Well, much as I might like to deny it, yes I must agree bug is fixed :) I think there might still be more performance to get, and initial timings show that 4 may be slower than 3 on some systems. However, it will also clearly be faster than 3 on some (so far, most) systems, and so far, is competitive everwhere, so not even I can call that a performance bug :) And yes, getting it into the next gcc release would be very helpful for ATLAS. >And do you still see the abysmal x87 single-precision FP performance? No, the problems were the same for both precisions. I haven't retimed all the systems, but here's the numbers I do have for the benchmark: DOUBLE SINGLE PEAK gcc3/gccS/gcc4 gcc3/gccS/gcc4 ==== ============== ============== Pentium-D : 2800 2359/2417/2067 2685/2684/2362 Ath64-X2 : 5600 3681/4011/2102 3716/4256/2207 Opteron : 3200 2590/2517/1507 2625/2800/1580 P4E : 2800 1767/1754/1480 1914/1954/1609 PentiumIII: 500 239/238/225 407/393/283 As you can see, on the benchmark, the single precision numbers are better than the double now. I cannot get single precision to run at quite the impressive 93% of peak as double when exercising the code generator on the Ath64-X2, but it gets a respectable 85% of peak (at these levels of performance, it takes only very minor differences to drop from 93 to 85, so that's not that unexpected: I am still investigating this). Thanks for all the help, Clint -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827