Issue |
83840
|
Summary |
[X86][AVX] Recognise out of bounds AVX2 shift amounts
|
Labels |
good first issue,
backend:X86,
missed-optimization
|
Assignees |
|
Reporter |
RKSimon
|
Pulled out of #39822 which was a bit too general.
Unlike the general ISD SRA/SRL/SHL nodes, the AVX2 vector shift nodes X86ISD VSRAV/VSRLV/VSHLV handle out of bounds shift amounts:
- VSRAV clamps the unsigned shift amount to (BITWIDTH-1)
- VSRLV/VSHLV returns a zero value for unsigned shift amounts greater than (BITWIDTH-1).
So when lowering vector shifts, we should be able to fold any shift amount clamp patterns and use the X86ISD nodetypes.
e.g.
```ll
define <4 x i32> @ashr(<4 x i32> %sh, <4 x i32> %amt) {
%elt.min.i = tail call <4 x i32> @llvm.umin.v4i32(<4 x i32> %amt, <4 x i32> <i32 31, i32 31, i32 31, i32 31>)
%shr = ashr <4 x i32> %sh, %elt.min.i
ret <4 x i32> %shr
}
```
->
```asm
ashr(int vector[4], unsigned int vector[4]):
vpbroadcastd xmm2, dword ptr [rip + .LCPI0_0] # xmm2 = [31,31,31,31]
vpminud xmm1, xmm1, xmm2
vpsravd xmm0, xmm0, xmm1
ret
```
vs
```asm
ashr(int vector[4], unsigned int vector[4]):
vpsravd xmm0, xmm0, xmm1
ret
```
Logical shifts are trickier but also foldable:
```ll
define <4 x i32> @lshr(<4 x i32> %sh, <4 x i32> %amt) {
%cmp.i = icmp ult <4 x i32> %amt, <i32 32, i32 32, i32 32, i32 32>
%shr = lshr <4 x i32> %sh, %amt
%0 = select <4 x i1> %cmp.i, <4 x i32> %shr, <4 x i32> zeroinitializer
ret <4 x i32> %0
}
define <4 x i32> @lshr2(<4 x i32> %sh, <4 x i32> %amt) {
%cmp.i = icmp ult <4 x i32> %amt, <i32 32, i32 32, i32 32, i32 32>
%0 = select <4 x i1> %cmp.i, <4 x i32> %sh, <4 x i32> zeroinitializer
%shr = lshr <4 x i32> %0, %amt
ret <4 x i32> %shr
}
```
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs