https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94870

            Bug ID: 94870
           Summary: Failure to use movhlps instead of seperated
                    mov+unpckhpd
           Product: gcc
           Version: 10.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: gabravier at gmail dot com
  Target Milestone: ---

typedef double v2df __attribute__((vector_size(16)));

v2df _mm_sqrt_sd(v2df a, v2df b)
{
    v2df c = __builtin_ia32_sqrtpd((v2df){b[0], b[1]});
    return (v2df){c[1], a[1]};
}

With -O3, LLVM outputs :

_mm_sqrt_sd(double __vector(2), double __vector(2)):
  sqrtpd xmm1, xmm1
  movhlps xmm0, xmm1 # xmm0 = xmm1[1],xmm0[1]
  ret

GCC outputs :

_mm_sqrt_sd(double __vector(2), double __vector(2)):
  movapd xmm2, xmm0
  sqrtpd xmm0, xmm1
  unpckhpd xmm0, xmm2
  ret

unpckhpd and movhlps seem to have equivalent performance, so using movhlps to
elide the extra movapd seems like it would make sense

Reply via email to