Issue 128237
Summary AVX-512 Mask registers being used when it's not needed
Labels new issue
Assignees
Reporter Whatcookie
    I've been running into some odd assembly generated by RPCS3's SPU LLVM backend.

In short: the AVX-512 code is slower than the AVX2 code due to compare into mask instructions being used, when the compare into vector instructions would be faster.

https://godbolt.org/z/dcjTKKaWj

In the FCGT3 function, both AVX2 and AVX-512 targets are able to use the compare into register instructions, as expected. In the FCGT2 function, where the only difference is fcmp ugt, inplace of fcmp ogt, LLVM is opting  to use the mask registers, which is inconvenient since we're emulating instructions which compare into the vector registers.

```
        vpminud xmm0, xmm0, xmmword ptr [rdi + rcx]
        vcmpnleps       xmm0, xmm0, xmmword ptr [rdi + rax]
```

```
        vpminud xmm0, xmm0, dword ptr [rip + .LCPI1_0]{1to4}
        vcmpnleps       k0, xmm0, xmmword ptr [rdi + rax]
 vpmovm2d        xmm0, k0
```
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to