Issue |
128237
|
Summary |
AVX-512 Mask registers being used when it's not needed
|
Labels |
new issue
|
Assignees |
|
Reporter |
Whatcookie
|
I've been running into some odd assembly generated by RPCS3's SPU LLVM backend.
In short: the AVX-512 code is slower than the AVX2 code due to compare into mask instructions being used, when the compare into vector instructions would be faster.
https://godbolt.org/z/dcjTKKaWj
In the FCGT3 function, both AVX2 and AVX-512 targets are able to use the compare into register instructions, as expected. In the FCGT2 function, where the only difference is fcmp ugt, inplace of fcmp ogt, LLVM is opting to use the mask registers, which is inconvenient since we're emulating instructions which compare into the vector registers.
```
vpminud xmm0, xmm0, xmmword ptr [rdi + rcx]
vcmpnleps xmm0, xmm0, xmmword ptr [rdi + rax]
```
```
vpminud xmm0, xmm0, dword ptr [rip + .LCPI1_0]{1to4}
vcmpnleps k0, xmm0, xmmword ptr [rdi + rax]
vpmovm2d xmm0, k0
```
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs