[llvm-bugs] [Bug 48429] New: Generated scatter instructions are slower than scalar version

via llvm-bugs Mon, 07 Dec 2020 10:20:00 -0800

https://bugs.llvm.org/show_bug.cgi?id=48429


            Bug ID: 48429
           Summary: Generated scatter instructions are slower than scalar
                    version
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: Backend: X86
          Assignee: unassignedb...@nondot.org
          Reporter: car...@google.com
                CC: craig.top...@gmail.com, llvm-bugs@lists.llvm.org,
                    llvm-...@redking.me.uk, pengfei.w...@intel.com,
                    spatel+l...@rotateright.com

Compile the following code with command line

clang  '--target=x86_64-grtev4-linux-gnu' -maes -m64 -mcx16 -msse4.2 -mpclmul
'-mprefer-vector-width=128' -fexperimental-new-pass-manager
-fsized-deallocation -O3 '-std=gnu++17' -c scatter.cc -save-temps

  __attribute((target("avx,avx2,fma,avx512f,avx512dq,avx512bw"))) void
    foo(int d, const float* ptr, float* dest)
    {
        const float* ptr_end = ptr + d;
        for (; ptr != ptr_end; ++ptr, dest += 16) {
          dest[0] = ptr[-1 * d];
          dest[1] = ptr[0 * d];
          dest[2] = ptr[1 * d];
          dest[3] = ptr[2 * d];
        }
    }


llvm generates 4 element scatters, which is more than 50% slower than scalar
version on my skylake desktop.

The problem is in function int X86TTIImpl::getGatherScatterOpCost(), it has
already found scatter is not profitable if avx512vl is not enabled, so it
should be scalarized, and return a scalarized cost. But the caller
LoopVectorize doesn't know it's a scalarized cost, it thinks it's a scatter
cost, and compares it with a different scalar cost computed by
getMemInstScalarizationCost, and unfortunately X86 backend computed scalar cost
is smaller than LoopVectorize computed scalar cost, so LoopVectorize thinks
scatter is cheaper than scalarize, and generates the slow scatter version.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

[llvm-bugs] [Bug 48429] New: Generated scatter instructions are slower than scalar version

Reply via email to