https://bugs.llvm.org/show_bug.cgi?id=49347

            Bug ID: 49347
           Summary: Memory access versioning adds bad(?) runtime predicate
                    to vectorized loop
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: Loop Optimizer
          Assignee: unassignedb...@nondot.org
          Reporter: mattias.v.eriks...@ericsson.com
                CC: llvm-bugs@lists.llvm.org

Created attachment 24571
  --> https://bugs.llvm.org/attachment.cgi?id=24571&action=edit
LV input

With the attached file, loop vectorization adds a runtime check so the the
vectorized loop only runs when that numOutputs == 1:

opt -S -o - lv-mav.ll -loop-vectorize -force-vector-width=4
[...]
  %ident.check = icmp ne i32 %numOutputs, 1
  %10 = or i1 %9, %ident.check
[...]
  %17 = or i1 %10, %16
  br i1 %17, label %scalar.ph, label %vector.ph

Running the vectorizer without memory access versioning, I get a partially
vectorized loop without the check on numOutputs:

opt -S -o - lv-mav.ll -loop-vectorize -force-vector-width=4
-enable-mem-access-versioning=0

In a performance issue I am looking at in my out-of-tree target, the partially
vectorized loop is faster than the scalar loop, but the check on numOutputs
makes the code always run the scalar loop. The vector code looks better when
numOutputs == 1, but it is worse in practice since the predicate is rarely
fulfilled.

I wonder if what LV does here makes sense in general? Is it a good idea to add
predicates like this and have the more general case only run the scalar version
of the loop?

-- 
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to