Issue 137700
Summary [AVX-512] `vpsubd a, b, vpmovm2d` can be done via a masked `vpsubd`
Labels new issue
Assignees
Reporter dzaima
    This code, compiled via `-O3 -march=znver5`:
```c
#include<stdint.h>
#include<immintrin.h>
__m512i count_gt100(uint32_t* in, size_t count) {
    __m512i neg_one = _mm512_set1_epi32(-1);
    __m512i acc = _mm512_set1_epi32(0);
    #pragma clang loop unroll(disable) // just to reduce noise
    for (size_t i = 0; i < count; i++) {
        __m512i val = _mm512_loadu_si512(in);
 __mmask16 mask = _mm512_cmpgt_epi32_mask(val, _mm512_set1_epi32(100));
 acc = _mm512_mask_sub_epi32(acc, mask, acc, neg_one);
        in += 16;
 }
    return acc;
}
```
contains:
```asm
        vpmovm2d zmm2, k0
        vpsubd  zmm0, zmm0, zmm2
```
whereas the implied desired code by the intrinsics, and as such gcc's codegen, takes just one instruction to do that - a masked `vpsubd`.

https://godbolt.org/z/MGjfGq96h
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to