https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85832

            Bug ID: 85832
           Summary: [AVX512] possible shorter code when comparing with
                    vector of zeros
           Product: gcc
           Version: 7.3.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: wojciech_mula at poczta dot onet.pl
  Target Milestone: ---

Consider this simple function, which yields mask fors non-zero elements:

---cat cmp.c---
#include <immintrin.h>

int fun(__m512i x) {
    return _mm512_cmpeq_epi32_mask(x, _mm512_setzero_si512());
}
---eof

$ gcc --version
gcc (Debian 7.3.0-16) 7.3.0

$ gcc -O2 -S -mavx512f cmp.c && cat cmp.s
fun:
        vpxord  %zmm1, %zmm1, %zmm1     # <<< HERE
        vpcmpeqd        %zmm1, %zmm0, %k1   # <<<
        kmovw   %k1, %eax
        vzeroupper
        ret

Also 8.1.0 generates the same code (as checked on godbolt.org).

The pair of instructions VPXORD/VPCMPEQD can be replaced with single
VPTESTMD %zmm0, %zmm0.  VPTESTMD performs k1 := zmm0 AND zmm0, so to
compare zmm0 with zeros it's sufficient.

Reply via email to