Issue |
134513
|
Summary |
Auto-vectorization via `masked.load` blocks constprop
|
Labels |
new issue
|
Assignees |
|
Reporter |
scottmcm
|
I was writing some code in Rust and ended up with the following IR, where even though everything's a constant -- it should just be `ret i64 165` -- the masked loads from autovectorization on `-Ctarget-cpu=x86-64-v3` kept that from happening:
```llvm
define noundef i64 @test() unnamed_addr #0 {
bb3.preheader:
%iter = alloca [64 x i8], align 8
call void @llvm.lifetime.start.p0(i64 64, ptr nonnull %iter)
%_3.sroa.5.0.iter.sroa_idx = getelementptr inbounds nuw i8, ptr %iter, i64 16
store <4 x i64> <i64 23, i64 16, i64 54, i64 3>, ptr %_3.sroa.5.0.iter.sroa_idx, align 8
%_3.sroa.9.0.iter.sroa_idx = getelementptr inbounds nuw i8, ptr %iter, i64 48
store i64 60, ptr %_3.sroa.9.0.iter.sroa_idx, align 8
%_3.sroa.10.0.iter.sroa_idx = getelementptr inbounds nuw i8, ptr %iter, i64 56
store i64 9, ptr %_3.sroa.10.0.iter.sroa_idx, align 8
%unmaskedload = load <4 x i64>, ptr %_3.sroa.5.0.iter.sroa_idx, align 8, !alias.scope !2
%0 = getelementptr inbounds nuw i8, ptr %iter, i64 48
%wide.masked.load.1 = call <4 x i64> @llvm.masked.load.v4i64.p0(ptr nonnull %0, i32 8, <4 x i1> <i1 true, i1 true, i1 false, i1 false>, <4 x i64> poison), !alias.scope !2
%1 = add <4 x i64> %wide.masked.load.1, %unmaskedload
%2 = shufflevector <4 x i64> %1, <4 x i64> %unmaskedload, <4 x i32> <i32 0, i32 1, i32 6, i32 7>
%3 = tail call i64 @llvm.vector.reduce.add.v4i64(<4 x i64> %2)
call void @llvm.lifetime.end.p0(i64 64, ptr nonnull %iter)
ret i64 %3
}
```
It looks like trunk can't optimize that to a constant either: <https://llvm.godbolt.org/z/z6MKz6cz1>
(Trunk at least doesn't need the store-load of the vector constant, but it still doesn't const-prop the stores and the `masked.load`.)
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs