https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70686
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Target|-march=core2 / |x86_64-*-* |-march=nocona (alternating) | Status|UNCONFIRMED |NEW Keywords| |missed-optimization Last reconfirmed| |2016-04-18 Component|c |tree-optimization CC| |rguenth at gcc dot gnu.org Host|Intel Q8200 Quad Core / | |linux 4.5.0 x64 | Ever confirmed|0 |1 Summary|-fprofile-generate (not |GIMPLE if-conversion slows |fprofile-use) somehow |down code |produces much faster binary | --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- It's not so mind-blowing - it's simply that -fprofile-generate makes our GIMPLE level if-conversion no longer apply. Without -fprofile-generate we if-convert the loop into for (i = 1; i <100000001; i++) { ... b = b + (b < 1.00001) ? i + 12.43 : 0.0; ... } thus we always evaluate the i + 12.43 and one additional addition of zero. We do this to eventually enable vectorization but without any check on whether it would be profitable when not vectorizing (your testcase shows it's not profitable). Confirmed. -fno-tree-loop-if-convert should fix it in this particular case.