Issue 125198
Summary [clang] [SIMD] Surprisign codegen for AVX2 is_less_equal
Labels clang
Assignees
Reporter jfalcou
    While trying to optimize [EVE](https://github.com/jfalcou/eve) codegen, we found out this discrepancy with gcc : 

https://godbolt.org/z/nj983Txh6

```c++
#include <immintrin.h>

auto test(__m256i a, __m256i b)
{
  return _mm256_cmpeq_epi32(_mm256_min_epi32(a, b), a);
}
```

G++ : 
```c++
test(long long __vector(4), long long __vector(4)):
 vpminsd ymm1, ymm0, ymm1
        vpcmpeqd        ymm0, ymm1, ymm0
 ret
```

CLANG : 
```x86asm
test(long long vector[4], long long vector[4]):
        vpcmpgtd        ymm0, ymm0, ymm1
        vpcmpeqd ymm1, ymm1, ymm1
        vpxor   ymm0, ymm0, ymm1
 ret
```

After looking up microops timing with LLVM-MCA and checking uops.info, it should give the same performance. However, the gcc code is 3 bytes shorter than the clang generation.

Is there any non trivial reason to perform this code generation or is it an oversight ?
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to