[Bug target/104610] memcmp () == 0 can be optimized better for avx512f

cvs-commit at gcc dot gnu.org via Gcc-bugs Wed, 28 Jun 2023 03:12:57 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104610


--- Comment #20 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Roger Sayle <sa...@gcc.gnu.org>:

https://gcc.gnu.org/g:4afbebcdc5780d28e52b7d65643e462c7c3882ce

commit r14-2159-g4afbebcdc5780d28e52b7d65643e462c7c3882ce
Author: Roger Sayle <ro...@nextmovesoftware.com>
Date:   Wed Jun 28 11:11:34 2023 +0100

    i386: Add cbranchti4 pattern to i386.md (for -m32 compare_by_pieces).

    This patch fixes some very odd (unanticipated) code generation by
    compare_by_pieces with -m32 -mavx, since the recent addition of the
    cbranchoi4 pattern.  The issue is that cbranchoi4 is available with
    TARGET_AVX, but cbranchti4 is currently conditional on TARGET_64BIT
    which results in the odd behaviour (thanks to OPTAB_WIDEN) that with
    -m32 -mavx, compare_by_pieces ends up (inefficiently) widening 128-bit
    comparisons to 256-bits before performing PTEST.

    This patch fixes this by providing a cbranchti4 pattern that's available
    with either TARGET_64BIT or TARGET_SSE4_1.

    For the test case below (again from PR 104610):

    int foo(char *a)
    {
        static const char t[] = "0123456789012345678901234567890";
        return __builtin_memcmp(a, &t[0], sizeof(t)) == 0;
    }

    GCC with -m32 -O2 -mavx currently produces the bonkers:

    foo:    pushl   %ebp
            movl    %esp, %ebp
            andl    $-32, %esp
            subl    $64, %esp
            movl    8(%ebp), %eax
            vmovdqa .LC0, %xmm4
            movl    $0, 48(%esp)
            vmovdqu (%eax), %xmm2
            movl    $0, 52(%esp)
            movl    $0, 56(%esp)
            movl    $0, 60(%esp)
            movl    $0, 16(%esp)
            movl    $0, 20(%esp)
            movl    $0, 24(%esp)
            movl    $0, 28(%esp)
            vmovdqa %xmm2, 32(%esp)
            vmovdqa %xmm4, (%esp)
            vmovdqa (%esp), %ymm5
            vpxor   32(%esp), %ymm5, %ymm0
            vptest  %ymm0, %ymm0
            jne     .L2
            vmovdqu 16(%eax), %xmm7
            movl    $0, 48(%esp)
            movl    $0, 52(%esp)
            vmovdqa %xmm7, 32(%esp)
            vmovdqa .LC1, %xmm7
            movl    $0, 56(%esp)
            movl    $0, 60(%esp)
            movl    $0, 16(%esp)
            movl    $0, 20(%esp)
            movl    $0, 24(%esp)
            movl    $0, 28(%esp)
            vmovdqa %xmm7, (%esp)
            vmovdqa (%esp), %ymm1
            vpxor   32(%esp), %ymm1, %ymm0
            vptest  %ymm0, %ymm0
            je      .L6
    .L2:    movl    $1, %eax
            xorl    $1, %eax
            vzeroupper
            leave
            ret
    .L6:    xorl    %eax, %eax
            xorl    $1, %eax
            vzeroupper
            leave
            ret

    with this patch, we now generate the (slightly) more sensible:

    foo:    vmovdqa .LC0, %xmm0
            movl    4(%esp), %eax
            vpxor   (%eax), %xmm0, %xmm0
            vptest  %xmm0, %xmm0
            jne     .L2
            vmovdqa .LC1, %xmm0
            vpxor   16(%eax), %xmm0, %xmm0
            vptest  %xmm0, %xmm0
            je      .L5
    .L2:    movl    $1, %eax
            xorl    $1, %eax
            ret
    .L5:    xorl    %eax, %eax
            xorl    $1, %eax
            ret

    2023-06-28  Roger Sayle  <ro...@nextmovesoftware.com>

    gcc/ChangeLog
            * config/i386/i386-expand.cc (ix86_expand_branch): Also use ptest
            for TImode comparisons on 32-bit architectures.
            * config/i386/i386.md (cbranch<mode>4): Change from SDWIM to
            SWIM1248x to exclude/avoid TImode being conditional on -m64.
            (cbranchti4): New define_expand for TImode on both TARGET_64BIT
            and/or with TARGET_SSE4_1.
            * config/i386/predicates.md (ix86_timode_comparison_operator):
            New predicate that depends upon TARGET_64BIT.
            (ix86_timode_comparison_operand): Likewise.

    gcc/testsuite/ChangeLog
            * gcc.target/i386/pieces-memcmp-2.c: New test case.

[Bug target/104610] memcmp () == 0 can be optimized better for avx512f

Reply via email to