https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117232
--- Comment #4 from Hongtao Liu <liuhongt at gcc dot gnu.org> --- (In reply to Andrew Pinski from comment #0) > This is expansion of PR 113609 which showed when I improved phiopt's factor > operations to handle more than just 1 operand operations. > > New reduced testcase that fails to use kortestw (even without my phiopt > improvements): > ``` > #include <immintrin.h> > > int > cmp_vector_je_mask64_t(__m512i a, __m512i b, int c, int d) { > __mmask64 k = _mm512_cmpeq_epi8_mask (a, b); > return k == (__mmask64) -1 ? c : d; > } > ``` > > GCC currently produces: > ``` > vpcmpb $0, %zmm1, %zmm0, %k0 > kmovq %k0, %rax > cmpq $-1, %rax > movl %edi, %eax > cmovne %esi, %eax > ret > ``` > > While LLVM produces: > ``` > movl %edi, %eax > vpcmpneqd %zmm1, %zmm0, %k0 > kortestw %k0, %k0 > cmovnel %esi, %eax > vzeroupper > retq > ``` Now GCC generates .cfi_startproc vpcmpb $0, %zmm1, %zmm0, %k0 movl %edi, %eax kortestq %k0, %k0 cmovnc %esi, %eax ret .cfi_endproc