Issue |
148577
|
Summary |
Stores with invariant address unnecessarily widened with EVL tail folding
|
Labels |
vectorizers
|
Assignees |
lukel97
|
Reporter |
lukel97
|
https://godbolt.org/z/9a6b9oe5c
Given a loop that has a store to an invariant address:
```llvm
define void @f(ptr %p, ptr %q, i32 %n) {
entry:
br label %loop
loop:
%iv = phi i32 [0, %entry], [%iv.next, %loop]
%gep = getelementptr i32, ptr %p, i32 %iv
%x = load i32, ptr %gep
%y = add i32 %x, 1
store i32 %y, ptr %gep
store i32 %y, ptr %q ; address invariant
%iv.next = add i32 %iv, 1
%done = icmp eq i32 %iv.next, %n
br i1 %done, label %exit, label %loop
exit:
ret void
}
```
Typically we extract the corresponding element and emit a scalar store:
```llvm
%wide.load = load <vscale x 4 x i32>, ptr %14, align 4
%15 = add <vscale x 4 x i32> %wide.load, splat (i32 1)
%19 = extractelement <vscale x 4 x i32> %15, i32 %18
store i32 %19, ptr %q, align 4
```
However with EVL tail folding we fail to do this and instead emit a scatter:
```
%vp.op.load = call <vscale x 4 x i32> @llvm.vp.load.nxv4i32.p0(ptr align 4 %13, <vscale x 4 x i1> splat (i1 true), i32 %11)
%14 = add <vscale x 4 x i32> %vp.op.load, splat (i32 1)
call void @llvm.vp.store.nxv4i32.p0(<vscale x 4 x i32> %14, ptr align 4 %13, <vscale x 4 x i1> splat (i1 true), i32 %11)
call void @llvm.vp.scatter.nxv4i32.nxv4p0(<vscale x 4 x i32> %14, <vscale x 4 x ptr> align 4 %broadcast.splat, <vscale x 4 x i1> splat (i1 true), i32 %11)
```
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs