http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57193
Bug #: 57193 Summary: suboptimal register allocation for SSE registers Classification: Unclassified Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: vermaelen.wou...@gmail.com This bug _might_ be related to PR56339, although that report talks about a regression compared to 4.7, while this bug seems to be a regression compared to 4.4. I was converting some hand-written asm code to SSE-intrinsics, but unfortunately the version using intrinsics generates worse code. It contains two unnecessary 'movdqa' instructions. I managed to reduce my test to this routine: //-------------------------------------------------------------- #include <emmintrin.h> void test1(const __m128i* in1, const __m128i* in2, __m128i* out, __m128i f, __m128i zero) { __m128i c = _mm_avg_epu8(*in1, *in2); __m128i l = _mm_unpacklo_epi8(c, zero); __m128i h = _mm_unpackhi_epi8(c, zero); __m128i m = _mm_mulhi_epu16(l, f); __m128i n = _mm_mulhi_epu16(h, f); *out = _mm_packus_epi16(m, n); } //-------------------------------------------------------------- A (few days old) gcc snapshot generates the following code. Versions 4.5, 4.6 and 4.7 generate similar code: 0: 66 0f 6f 17 movdqa (%rdi),%xmm2 4: 66 0f e0 16 pavgb (%rsi),%xmm2 8: 66 0f 6f da movdqa %xmm2,%xmm3 c: 66 0f 68 d1 punpckhbw %xmm1,%xmm2 10: 66 0f 60 d9 punpcklbw %xmm1,%xmm3 14: 66 0f e4 d0 pmulhuw %xmm0,%xmm2 18: 66 0f 6f cb movdqa %xmm3,%xmm1 1c: 66 0f e4 c8 pmulhuw %xmm0,%xmm1 20: 66 0f 6f c1 movdqa %xmm1,%xmm0 24: 66 0f 67 c2 packuswb %xmm2,%xmm0 28: 66 0f 7f 02 movdqa %xmm0,(%rdx) 2c: c3 retq Gcc version 4.3 and 4.4 (and clang) generate the following optimal(?) code: 0: 66 0f 6f 17 movdqa (%rdi),%xmm2 4: 66 0f e0 16 pavgb (%rsi),%xmm2 8: 66 0f 6f da movdqa %xmm2,%xmm3 c: 66 0f 68 d1 punpckhbw %xmm1,%xmm2 10: 66 0f 60 d9 punpcklbw %xmm1,%xmm3 14: 66 0f e4 d8 pmulhuw %xmm0,%xmm3 18: 66 0f e4 c2 pmulhuw %xmm2,%xmm0 1c: 66 0f 67 d8 packuswb %xmm0,%xmm3 20: 66 0f 7f 1a movdqa %xmm3,(%rdx) 24: c3 retq