https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94866
Bug ID: 94866 Summary: Failure to optimize pinsrq of 0 with index 1 into movq Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- typedef int64_t v2di __attribute__((vector_size(16))); typedef int32_t v2si __attribute__((vector_size(8))); v2di _mm_move_epi64(v2di a) { return v2di{a[0], 0LL}; } LLVM with `-O3 -msse4.1` compiles this to this : _mm_move_epi64(long __vector(2)): # @_mm_move_epi64(long __vector(2)) movq xmm0, xmm0 # xmm0 = xmm0[0],zero ret GCC gives : _mm_move_epi64(long __vector(2)): xor eax, eax pinsrq xmm0, rax, 1 ret GCC's output seems like it would naturally be much slower, so unless there is something seriously messed up with x86 chips that I've missed, LLVM's version should be faster