Issue |
125198
|
Summary |
[clang] [SIMD] Surprisign codegen for AVX2 is_less_equal
|
Labels |
clang
|
Assignees |
|
Reporter |
jfalcou
|
While trying to optimize [EVE](https://github.com/jfalcou/eve) codegen, we found out this discrepancy with gcc :
https://godbolt.org/z/nj983Txh6
```c++
#include <immintrin.h>
auto test(__m256i a, __m256i b)
{
return _mm256_cmpeq_epi32(_mm256_min_epi32(a, b), a);
}
```
G++ :
```c++
test(long long __vector(4), long long __vector(4)):
vpminsd ymm1, ymm0, ymm1
vpcmpeqd ymm0, ymm1, ymm0
ret
```
CLANG :
```x86asm
test(long long vector[4], long long vector[4]):
vpcmpgtd ymm0, ymm0, ymm1
vpcmpeqd ymm1, ymm1, ymm1
vpxor ymm0, ymm0, ymm1
ret
```
After looking up microops timing with LLVM-MCA and checking uops.info, it should give the same performance. However, the gcc code is 3 bytes shorter than the clang generation.
Is there any non trivial reason to perform this code generation or is it an oversight ?
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs