https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117093

--- Comment #6 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jennifer Schmitz <jschm...@gcc.gnu.org>:

https://gcc.gnu.org/g:c83e2d47574fd9a21f257e0f0d7e350c3f1b0618

commit r15-5324-gc83e2d47574fd9a21f257e0f0d7e350c3f1b0618
Author: Jennifer Schmitz <jschm...@nvidia.com>
Date:   Mon Nov 4 07:56:09 2024 -0800

    match.pd: Fold vec_perm with view_convert

    This patch improves the codegen for the following test case:
    uint64x2_t foo (uint64x2_t r) {
        uint32x4_t a = vreinterpretq_u32_u64 (r);
        uint32_t t;
        t = a[0]; a[0] = a[1]; a[1] = t;
        t = a[2]; a[2] = a[3]; a[3] = t;
        return vreinterpretq_u64_u32 (a);
    }
    from (-O1):
    foo:
            mov     v31.16b, v0.16b
            ins     v0.s[0], v0.s[1]
            ins     v0.s[1], v31.s[0]
            ins     v0.s[2], v31.s[3]
            ins     v0.s[3], v31.s[2]
            ret
    to:
    foo:
            rev64   v0.4s, v0.4s
            ret

    This is achieved by extending the following match.pd pattern to account
    for type differences between @0 and @1 due to view converts.
    /* Simplify vector inserts of other vector extracts to a permute.  */
    (simplify
     (bit_insert @0 (BIT_FIELD_REF@2 @1 @rsize @rpos) @ipos)

    The patch was bootstrapped and regtested on aarch64-linux-gnu and
    x86_64-linux-gnu, no regression.
    OK for mainline?

    Signed-off-by: Jennifer Schmitz <jschm...@nvidia.com>
    Co-authored-by: Richard Biener <rguent...@suse.de>

    gcc/
            PR tree-optimization/117093
            * match.pd: Extend
            (bit_insert @0 (BIT_FIELD_REF@2 @1 @rsize @rpos) @ipos) to allow
            type differences between @0 and @1 due to view converts.

    gcc/testsuite/
            PR tree-optimization/117093
            * gcc.dg/tree-ssa/pr117093.c: New test.

Reply via email to