| Issue |
171746
|
| Summary |
Another case where AVX-512 mask moves are being used when they are not needed
|
| Labels |
new issue
|
| Assignees |
|
| Reporter |
Whatcookie
|
Hello, here is a situation where AVX-512 is generating worse code than AVX2. In the godbolt link, I have included the LLVM IR that RPCS3 (Playstation 3 emulator) produces for emulation of the SPU instructions GB, and GBH. These instructions take the least significant bits from each element, and place it at the end of a vector register. Essentially it's like pmovmsk* except it places the result in a vector register instead of a GPR, (no GPRs on SPUs) and it takes the least signicant bit instead of the most significant bit.
For GB, using the mask registers are suboptimal, as it just adds an extra instruction: move to mask register, move to gpr, move to xmm.
The AVX2 path (znver3 in the link) is optimal, as it just moves from xmm to gpr, then back again.
For GBH, since there is no pmovmskw, the existing AVX-512 output is ideal. Though note that RPCS3 uses a slick trick with gf2p8affineqb to emulate GBH with just 2 instructions: https://github.com/RPCS3/rpcs3/pull/14669
https://godbolt.org/z/d1zbaqEdq
In short:
AVX2 output (good)
vmovmskps eax, xmm0
vmovd xmm0, eax
AVX512 output (bad)
vpmovd2m k0, xmm0
kmovb eax, k0
vmovd xmm0, eax
Thanks!
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs