Compile the attached file test.cpp with the following flags: g++ -DNDEBUG -m32 -msse2 -O2 -msse2 -DSHOW_BUG gcc_bug.cpp -S
In the generated code there are useless stores and loads of %xmm0 to -40(%eps) and -56(%eps). If the code is compiled without -DSHOW_BUG it will generate a more optimal version without the extra memory accesses. See the attached generated files test_bad.s and test_good.s for the resulting code. This is an old problem existing in 4.1.x. -- Summary: Missed optimization causing extra loads and stores when using x86_64 builtin function together with aggregate types. Product: gcc Version: 4.3.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: jsjodin at gcc dot gnu dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34043