------- Additional Comments From athena at fftw dot org 2005-02-16 20:44 ------- > Unfortunately, I doubt whether it'll be possible to siumultaneously address > this performance regression without reintroducing the 3.x issue mentioned in > the original "PS". I doubt on many platforms a two multiply-adds are much > faster than a single floating point multiplication whose result is shared by > two additions. Though again it might be possible to do something at the RTL > level, especially if duplicating the multiplication is a win with -Os.
PowerPC is indeed a platform where an addition costs the same as a multiplication and the same as a fused multiply-add. The ia64 FPU does FMA's only; you code A*B as A*B+(-0), and A+B as A*1+B. (On a related matter, altivec has FMA but not multiplication, and the same trick applies.) Bottom line: gcc should make an effort to respect FMAs, at least when they appear explicitly in the source code. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19988