Issue |
147863
|
Summary |
[LLVM] Suboptimal code generated for rounding right shifts on NEON, AArch64 SIMD, LSX, and RISC-V V
|
Labels |
new issue
|
Assignees |
|
Reporter |
johnplatts
|
Here is a link to a snippet that generates suboptimal code on NEON, AArch64 SIMD, LSX, and RISC-V V:
https://alive2.llvm.org/ce/z/V82FtF
Alive2 has determined that the transformation of `(a >> b) + ((b == 0) ? 0 : ((a >> (b - 1)) & 1))` to `(b == 0) ? a : ((a >> (b - 1)) - (a >> b))` seems to be correct.
The above snippet can be further optimized to the following on ARMv7 NEON (arm-linux-gnueabihf):
```
src1: @ @src1
vneg.s32 q9, q1
vrshl.u32 q0, q0, q9
mov pc, lr
tgt1: @ @tgt1
vneg.s32 q9, q1
vrshl.u32 q0, q0, q9
mov pc, lr
src2: @ @src2
vneg.s32 q9, q1
vrshl.s32 q0, q0, q9
mov pc, lr
tgt2: @ @tgt2
vneg.s32 q9, q1
vrshl.s32 q0, q0, q9
mov pc, lr
```
The above snippet can be further optimized to the following on AArch64:
```
src1: // @src1
neg v1.4s, v1.4s
urshl v0.4s, v0.4s, v1.4s
ret
tgt1: // @tgt1
neg v1.4s, v1.4s
urshl v0.4s, v0.4s, v1.4s
ret
src2: // @src2
neg v1.4s, v1.4s
srshl v0.4s, v0.4s, v1.4s
ret
tgt2: // @tgt2
neg v1.4s, v1.4s
srshl v0.4s, v0.4s, v1.4s
ret
```
The above snippet can be further optimized to the following on LoongArch64 with LSX:
```
src1: # @src1
vsrlr.w $vr0, $vr0, $vr1
ret
tgt1: # @tgt1
vsrlr.w $vr0, $vr0, $vr1
ret
src2: # @src2
vsrar.w $vr0, $vr0, $vr1
ret
tgt2: # @tgt2
vsrar.w $vr0, $vr0, $vr1
ret
```
The above snippet can be further optimized to the following on 64-bit RISC-V with the "V" extension:
```
src1: # @src1
csrwi vxrm, 0
vsetivli zero, 4, e32, m1, ta, ma
vssrl.vv v8, v8, v9
ret
tgt1: # @tgt1
csrwi vxrm, 0
vsetivli zero, 4, e32, m1, ta, ma
vssrl.vv v8, v8, v9
ret
src2: # @src2
csrwi vxrm, 0
vsetivli zero, 4, e32, m1, ta, ma
vssra.vv v8, v8, v9
ret
tgt2: # @tgt2
csrwi vxrm, 0
vsetivli zero, 4, e32, m1, ta, ma
vssra.vv v8, v8, v9
ret
```
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs