http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53967
Stupachenko Evgeny <evstupac at gmail dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |evstupac at gmail dot com --- Comment #12 from Stupachenko Evgeny <evstupac at gmail dot com> 2012-07-18 09:45:15 UTC --- I tried it at "-O2" and got low performance with -mfpmath=sse. It looks like it is caused by register dependency (%xmm0) between: addss %xmm0, %xmm1 cvtsi2ss %eax, %xmm0 Renaming %xmm0 in cvtsi2ss to another free register in all such cases resolves the issue. Also you can try "-O2 -funroll-loops", which made "sse" code even faster and and "-O2 -fschedule-insns" which significantly reduced performance loses in "sse" case.