[llvm-bugs] [Bug 129899] s390x: vector cast using shuffle does not optimize well

LLVM Bugs via llvm-bugs Wed, 05 Mar 2025 08:35:58 -0800

Issue	129899
Summary	s390x: vector cast using shuffle does not optimize well
Labels	new issue
Assignees
Reporter	folkertdev

    https://godbolt.org/z/6sodYY3fW

This LLVM IR


```llvm
define range(i64 -128, 128) <2 x i64> @manual_vec_extend_s64(<16 x i8> %a) unnamed_addr {
start:
  %0 = shufflevector <16 x i8> %a, <16 x i8> poison, <2 x i32> <i32 7, i32 15>
  %1 = sext <2 x i8> %0 to <2 x i64>
  ret <2 x i64> %1
}
``` 

does not optimize to a single instruction. 

The C code uses a slightly different (more manual) lowering to LLVM IR:

https://godbolt.org/z/aencTa3nq

```llvm
define dso_local <2 x i64> @a(<16 x i8> noundef %a) local_unnamed_addr {
entry:
  %vecext.i = extractelement <16 x i8> %a, i64 7
  %conv.i = sext i8 %vecext.i to i64
 %vecinit.i = insertelement <2 x i64> poison, i64 %conv.i, i64 0
  %vecext1.i = extractelement <16 x i8> %a, i64 15
  %conv2.i = sext i8 %vecext1.i to i64
  %vecinit3.i = insertelement <2 x i64> %vecinit.i, i64 %conv2.i, i64 1
  ret <2 x i64> %vecinit3.i
}
```

but rust can't replicate that at the moment. That's a bug we'll fix on the rust side, but still I think the `shufflevector` should also work. And it seems like it might be smaller in LLVM IR and hence preferred for efficiency as the clang lowering as well?

_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

[llvm-bugs] [Bug 129899] s390x: vector cast using shuffle does not optimize well

Reply via email to