https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94866

            Bug ID: 94866
           Summary: Failure to optimize pinsrq of 0 with index 1 into movq
           Product: gcc
           Version: 10.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: gabravier at gmail dot com
  Target Milestone: ---

typedef int64_t v2di __attribute__((vector_size(16)));
typedef int32_t v2si __attribute__((vector_size(8)));

v2di _mm_move_epi64(v2di a)
{
    return v2di{a[0], 0LL};
}

LLVM with `-O3 -msse4.1` compiles this to this : 

_mm_move_epi64(long __vector(2)): # @_mm_move_epi64(long __vector(2))
  movq xmm0, xmm0 # xmm0 = xmm0[0],zero
  ret

GCC gives :

_mm_move_epi64(long __vector(2)):
  xor eax, eax
  pinsrq xmm0, rax, 1
  ret

GCC's output seems like it would naturally be much slower, so unless there is
something seriously messed up with x86 chips that I've missed, LLVM's version
should be faster

Reply via email to