https://bugs.llvm.org/show_bug.cgi?id=49347
Bug ID: 49347
Summary: Memory access versioning adds bad(?) runtime predicate
to vectorized loop
Product: libraries
Version: trunk
Hardware: PC
OS: Linux
Status: NEW
Severity: enhancement
Priority: P
Component: Loop Optimizer
Assignee: unassignedb...@nondot.org
Reporter: mattias.v.eriks...@ericsson.com
CC: llvm-bugs@lists.llvm.org
Created attachment 24571
--> https://bugs.llvm.org/attachment.cgi?id=24571&action=edit
LV input
With the attached file, loop vectorization adds a runtime check so the the
vectorized loop only runs when that numOutputs == 1:
opt -S -o - lv-mav.ll -loop-vectorize -force-vector-width=4
[...]
%ident.check = icmp ne i32 %numOutputs, 1
%10 = or i1 %9, %ident.check
[...]
%17 = or i1 %10, %16
br i1 %17, label %scalar.ph, label %vector.ph
Running the vectorizer without memory access versioning, I get a partially
vectorized loop without the check on numOutputs:
opt -S -o - lv-mav.ll -loop-vectorize -force-vector-width=4
-enable-mem-access-versioning=0
In a performance issue I am looking at in my out-of-tree target, the partially
vectorized loop is faster than the scalar loop, but the check on numOutputs
makes the code always run the scalar loop. The vector code looks better when
numOutputs == 1, but it is worse in practice since the predicate is rarely
fulfilled.
I wonder if what LV does here makes sense in general? Is it a good idea to add
predicates like this and have the more general case only run the scalar version
of the loop?
--
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs