http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54592
Bug #: 54592 Summary: [4.8 Regression] [missed-optimization] Cannot fuse SSE move and add together Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: sgunder...@bigfoot.com Hi, I have, on x86-64, gcc version 4.7.1 (Debian 4.7.1-9) gcc version 4.8.0 20120820 (experimental) [trunk revision 190537] (Debian 20120820-1) Given the following test program: #include <emmintrin.h> void func(__m128i *foo, size_t a, size_t b, int *dst) { __m128i x = foo[a]; __m128i y = foo[b]; __m128i sum = _mm_add_epi32(x, y); *dst = _mm_cvtsi128_si32(sum); } GCC 4.8 with -O2 compiles it to 0: 48 c1 e6 04 shl $0x4,%rsi 4: 48 c1 e2 04 shl $0x4,%rdx 8: 66 0f 6f 0c 17 movdqa (%rdi,%rdx,1),%xmm1 d: 66 0f 6f 04 37 movdqa (%rdi,%rsi,1),%xmm0 12: 66 0f fe c1 paddd %xmm1,%xmm0 16: 66 0f 7e 01 movd %xmm0,(%rcx) 1a: c3 retq The mov into %xmm1 here doesn't seem to make sense; it should rather be paddd-ed in directly. And indeed, GCC 4.7 with -O2 gets this right: 0: 48 c1 e6 04 shl $0x4,%rsi 4: 48 c1 e2 04 shl $0x4,%rdx 8: 66 0f 6f 04 37 movdqa (%rdi,%rsi,1),%xmm0 d: 66 0f fe 04 17 paddd (%rdi,%rdx,1),%xmm0 12: 66 0f 7e 01 movd %xmm0,(%rcx) 16: c3 retq This would seem like a regression to me.