https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117232

--- Comment #4 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #0)
> This is expansion of PR 113609 which showed when I improved phiopt's factor
> operations to handle more than just 1 operand operations.
> 
> New reduced testcase that fails to use kortestw (even without my phiopt
> improvements):
> ```
> #include <immintrin.h>
> 
> int
> cmp_vector_je_mask64_t(__m512i a, __m512i b, int c, int d) {
>     __mmask64 k = _mm512_cmpeq_epi8_mask (a, b);
>     return k == (__mmask64) -1 ? c : d;
> }
> ```
> 
> GCC currently produces:
> ```
>         vpcmpb  $0, %zmm1, %zmm0, %k0
>         kmovq   %k0, %rax
>         cmpq    $-1, %rax
>         movl    %edi, %eax
>         cmovne  %esi, %eax
>         ret
> ```
> 
> While LLVM produces:
> ```
>         movl    %edi, %eax
>         vpcmpneqd       %zmm1, %zmm0, %k0
>         kortestw        %k0, %k0
>         cmovnel %esi, %eax
>         vzeroupper
>         retq
> ```

Now GCC generates

        .cfi_startproc
        vpcmpb  $0, %zmm1, %zmm0, %k0
        movl    %edi, %eax
        kortestq        %k0, %k0
        cmovnc  %esi, %eax
        ret
        .cfi_endproc

Reply via email to