http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54592

             Bug #: 54592
           Summary: [4.8 Regression] [missed-optimization] Cannot fuse SSE
                    move and add together
    Classification: Unclassified
           Product: gcc
           Version: 4.8.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
        AssignedTo: unassig...@gcc.gnu.org
        ReportedBy: sgunder...@bigfoot.com


Hi,

I have, on x86-64,

  gcc version 4.7.1 (Debian 4.7.1-9) 
  gcc version 4.8.0 20120820 (experimental) [trunk revision 190537] (Debian
20120820-1) 

Given the following test program:

  #include <emmintrin.h>

  void func(__m128i *foo, size_t a, size_t b, int *dst)
  {
    __m128i x = foo[a];
    __m128i y = foo[b];
    __m128i sum = _mm_add_epi32(x, y);
    *dst = _mm_cvtsi128_si32(sum);
  }

GCC 4.8 with -O2 compiles it to

   0:    48 c1 e6 04              shl    $0x4,%rsi
   4:    48 c1 e2 04              shl    $0x4,%rdx
   8:    66 0f 6f 0c 17           movdqa (%rdi,%rdx,1),%xmm1
   d:    66 0f 6f 04 37           movdqa (%rdi,%rsi,1),%xmm0
  12:    66 0f fe c1              paddd  %xmm1,%xmm0
  16:    66 0f 7e 01              movd   %xmm0,(%rcx)
  1a:    c3                       retq   

The mov into %xmm1 here doesn't seem to make sense; it should rather be
paddd-ed in directly. And indeed, GCC 4.7 with -O2 gets this right:

   0:    48 c1 e6 04              shl    $0x4,%rsi
   4:    48 c1 e2 04              shl    $0x4,%rdx
   8:    66 0f 6f 04 37           movdqa (%rdi,%rsi,1),%xmm0
   d:    66 0f fe 04 17           paddd  (%rdi,%rdx,1),%xmm0
  12:    66 0f 7e 01              movd   %xmm0,(%rcx)
  16:    c3                       retq   

This would seem like a regression to me.

Reply via email to