Issue |
129276
|
Summary |
[X86] Suboptimal codegen for broadcasting a 16-bit vector element
|
Labels |
new issue
|
Assignees |
|
Reporter |
dzaima
|
This LLVM IR:
```llvm
define <16 x i16> @broadcast_sel15(<16 x i16> noundef %x) {
%r = shufflevector <16 x i16> %x, <16 x i16> poison, <16 x i32> splat(i32 15)
ret <16 x i16> %r
}
```
could be compiled to:
```asm
vpshufhw ymm0, ymm0, 255
vpermq ymm0, ymm0, 255
```
but llvm produces:
```asm
vpshufhw ymm0, ymm0, 255
vpbroadcastd ymm1, dword ptr [rip + .LCPI15_0] ; 6
vpermd ymm0, ymm1, ymm0
```
[all 16 cases with C intrinsics with gcc for comparison](https://c.godbolt.org/z/r6ssxno7z), and [direct LLVM IR](https://llvm.godbolt.org/z/eYvbrjvME)
12..15 are the most problematic ones, but the codegen for 8 of `vextracti128`+`vpbroadcastw` would also likely be better off as `vpshuflw`+`vpermq` to avoid having two cross-lane ops. The rest are fine either way I think.
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs