http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
--- Comment #121 from lucier at math dot purdue.edu 2011-04-02 16:58:16 UTC --- I'm inclined to close this as "Fixed" for 4.6.0. I've taken the file mentioned in the previous comment and followed the instructions in the readme. The times for a forward FFT of 2^{25} complex doubles on a 2.4HGz Intel Core i5 on x86_64-apple-darwin10.7.0 are as follows: With the usual compiler options of -O1 -fno-math-errno -fschedule-insns2 -fno-trapping-math -fno-strict-aliasing -fwrapv -fomit-frame-pointer -fPIC -fno-common -mieee-fp 4.5.2: 2433 ms cpu time (2427 user, 6 system) 4.6.0: 2158 ms cpu time (2154 user, 4 system) Adding -fschedule-insns -march=native to the above: 4.5.2: 2067 ms cpu time (2060 user, 7 system) 4.6.0: 2016 ms cpu time (2012 user, 4 system) The assembly for the main loop looks much better.