Issue 144266
Summary [X86] Delaying widening results in an unnecessary `vpmovsxwd` copy
Labels new issue
Assignees
Reporter dzaima
    This C code:
```c
__m256i iter(int16_t* src1p) {
    __m256i ten = _mm256_set1_epi32(10);
    __m256i wload = _mm256_cvtepi16_epi32(_mm_loadu_si128((void*)src1p));
    __m256i mask = _mm256_cmpgt_epi32(wload, ten);
    return _mm256_add_epi32(wload, mask);
}
```
compiled with `-O3 -march=haswell`, results in:
```asm
iter:
        vmovdqu xmm0, xmmword ptr [rdi]
 vpmovsxwd       ymm1, xmm0
        vpcmpgtw        xmm0, xmm0, xmmword ptr [rip + .LCPI0_0]
        vpmovsxwd       ymm0, xmm0
        vpaddd  ymm0, ymm0, ymm1
        ret
```
but it could be
```asm
iter:
 vpmovsxwd       ymm0, xmmword ptr [rdi]
        vpbroadcastd    ymm1, dword ptr [rip + .LCPI0_0]
        vpcmpgtd        ymm1, ymm0, ymm1
 vpaddd  ymm0, ymm1, ymm0
        ret
```
avoiding having two `vpmovsxwd`s, and allowing the one that's left to have the memory operand inline.

https://godbolt.org/z/Ezrf9YbYn

_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to