Issue 134513
Summary Auto-vectorization via `masked.load` blocks constprop
Labels new issue
Assignees
Reporter scottmcm
    I was writing some code in Rust and ended up with the following IR, where even though everything's a constant -- it should just be `ret i64 165` -- the masked loads from autovectorization on `-Ctarget-cpu=x86-64-v3` kept that from happening:
```llvm
define noundef i64 @test() unnamed_addr #0 {
bb3.preheader:
  %iter = alloca [64 x i8], align 8
  call void @llvm.lifetime.start.p0(i64 64, ptr nonnull %iter)
 %_3.sroa.5.0.iter.sroa_idx = getelementptr inbounds nuw i8, ptr %iter, i64 16
  store <4 x i64> <i64 23, i64 16, i64 54, i64 3>, ptr %_3.sroa.5.0.iter.sroa_idx, align 8
  %_3.sroa.9.0.iter.sroa_idx = getelementptr inbounds nuw i8, ptr %iter, i64 48
  store i64 60, ptr %_3.sroa.9.0.iter.sroa_idx, align 8
  %_3.sroa.10.0.iter.sroa_idx = getelementptr inbounds nuw i8, ptr %iter, i64 56
  store i64 9, ptr %_3.sroa.10.0.iter.sroa_idx, align 8
  %unmaskedload = load <4 x i64>, ptr %_3.sroa.5.0.iter.sroa_idx, align 8, !alias.scope !2
  %0 = getelementptr inbounds nuw i8, ptr %iter, i64 48
  %wide.masked.load.1 = call <4 x i64> @llvm.masked.load.v4i64.p0(ptr nonnull %0, i32 8, <4 x i1> <i1 true, i1 true, i1 false, i1 false>, <4 x i64> poison), !alias.scope !2
  %1 = add <4 x i64> %wide.masked.load.1, %unmaskedload
  %2 = shufflevector <4 x i64> %1, <4 x i64> %unmaskedload, <4 x i32> <i32 0, i32 1, i32 6, i32 7>
  %3 = tail call i64 @llvm.vector.reduce.add.v4i64(<4 x i64> %2)
  call void @llvm.lifetime.end.p0(i64 64, ptr nonnull %iter)
  ret i64 %3
}
```

It looks like trunk can't optimize that to a constant either: <https://llvm.godbolt.org/z/z6MKz6cz1>

(Trunk at least doesn't need the store-load of the vector constant, but it still doesn't const-prop the stores and the `masked.load`.)
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to