Issue 129276
Summary [X86] Suboptimal codegen for broadcasting a 16-bit vector element
Labels new issue
Assignees
Reporter dzaima
    This LLVM IR:
```llvm
define <16 x i16> @broadcast_sel15(<16 x i16> noundef %x) {
  %r = shufflevector <16 x i16> %x, <16 x i16> poison, <16 x i32> splat(i32 15)
  ret <16 x i16> %r
}
```
could be compiled to:
```asm
        vpshufhw ymm0, ymm0, 255
 vpermq   ymm0, ymm0, 255
```

but llvm produces:

```asm
 vpshufhw        ymm0, ymm0, 255
        vpbroadcastd    ymm1, dword ptr [rip + .LCPI15_0] ; 6
        vpermd  ymm0, ymm1, ymm0
```
[all 16 cases with C intrinsics with gcc for comparison](https://c.godbolt.org/z/r6ssxno7z), and [direct LLVM IR](https://llvm.godbolt.org/z/eYvbrjvME)

12..15 are the most problematic ones, but the codegen for 8 of `vextracti128`+`vpbroadcastw` would also likely be better off as `vpshuflw`+`vpermq` to avoid having two cross-lane ops. The rest are fine either way I think.
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to