https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51017
--- Comment #16 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Alexander Peslyak from comment #14) > For completeness, here are the results for 4.7.x, 4.8.x, and 4.9.0: > > 4.7.0o - 2142K c/s, 29692 bytes, 1267 movaps, 465 movups > 4.7.0h - 2823K c/s, 29692 bytes, 1732 movaps, 0 movups > 4.7.4o - 2144K c/s, 29692 bytes, 1267 movaps, 465 movups > 4.7.4h - 2827K c/s, 29692 bytes, 1732 movaps, 0 movups > 4.8.0o - 1825K c/s, 27813 bytes, 1341 movaps, 721 movups > 4.8.0h - 2792K c/s, 27813 bytes, 2062 movaps, 0 movups > 4.8.4o - 1827K c/s, 27807 bytes, 1341 movaps, 721 movups > 4.8.4h - 2786K c/s, 27807 bytes, 2062 movaps, 0 movups > 4.9.0o - 1852K c/s, 28262 bytes, 1319 movaps, 721 movups > 4.9.0h - 2685K c/s, 28262 bytes, 2040 movaps, 0 movups > > 4.8 produces the smallest code so far, but even with the aligned loads hack > is still 6% slower than 4.3. > > All of these are with "-O2 -fomit-frame-pointer -Os -funroll-loops > -finline-functions", like similar results I had posted before. Xeon E5420, > x86_64. I'm completely confused now as to what the original regression was reported against. I thought it was the default options in the Makefile, -O2 -fomit-frame-pointer, which showed the regression and you found -Os would mitigate it somewhat (and I more specifically told you it is -fno-tree-pre that makes the actual difference). So - what options give good results with old compilers but bad results with new compilers?