[llvm-bugs] [Bug 39432] New: [SLPVectorizer] Investigate using poor throughput instructions as seed points

via llvm-bugs Thu, 25 Oct 2018 09:26:55 -0700

https://bugs.llvm.org/show_bug.cgi?id=39432


            Bug ID: 39432
           Summary: [SLPVectorizer] Investigate using poor throughput
                    instructions as seed points
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: Windows NT
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: Scalar Optimizations
          Assignee: unassignedb...@nondot.org
          Reporter: llvm-...@redking.me.uk
                CC: a.bat...@hotmail.com, andrea.dibia...@gmail.com,
                    dtemirbula...@gmail.com, llvm-bugs@lists.llvm.org

Even for otherwise very different code paths, its often very beneficial to
vectorize poor throughput instructions (FDIV + FSQRT in particular) as they are
usually the bottleneck:

Codegen: https://godbolt.org/z/bWkftx

LLVM MCA Analysis: https://godbolt.org/z/eYmVHk

void prim(double x, double y, double z, double w, double *p0, double *p1) {
    x -= z;
    y += w;
    x /= z;
    y /= w;
    x -= z;
    y += w;
    *p0++ = x;
    *p1++ = y;
}

Z4primddddPdS_: # @_Z4primddddPdS_
  vsubsd %xmm2, %xmm0, %xmm0
  vaddsd %xmm1, %xmm3, %xmm1
  vdivsd %xmm2, %xmm0, %xmm0
  vdivsd %xmm3, %xmm1, %xmm1
  vsubsd %xmm2, %xmm0, %xmm0
  vaddsd %xmm3, %xmm1, %xmm1
  vmovsd %xmm0, (%rdi)
  vmovsd %xmm1, (%rsi)
  retq

block throughput: 38cy

void prim2(double x, double y, double z, double w, double *p0, double *p1) {
    x -= z;
    y += w;
    __m128d xy = _mm_div_pd(_mm_setr_pd(x, y), _mm_setr_pd(z, w));
    x = xy[0];
    y = xy[1];
    x -= z;
    y += w;
    *p0++ = x;
    *p1++ = y;
}

_Z5prim2ddddPdS_: # @_Z5prim2ddddPdS_
  vsubsd %xmm2, %xmm0, %xmm0
  vaddsd %xmm1, %xmm3, %xmm1
  vunpcklpd %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0],xmm1[0]
  vunpcklpd %xmm3, %xmm2, %xmm1 # xmm1 = xmm2[0],xmm3[0]
  vdivpd %xmm1, %xmm0, %xmm0
  vpermilpd $1, %xmm0, %xmm1 # xmm1 = xmm0[1,0]
  vsubsd %xmm2, %xmm0, %xmm0
  vaddsd %xmm3, %xmm1, %xmm1
  vmovsd %xmm0, (%rdi)
  vmovsd %xmm1, (%rsi)
  retq

block throughput: 19cy

-- 
You are receiving this mail because:
You are on the CC list for the bug.

_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

[llvm-bugs] [Bug 39432] New: [SLPVectorizer] Investigate using poor throughput instructions as seed points

Reply via email to