https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117048

--- Comment #4 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Kyrylo Tkachov <ktkac...@gcc.gnu.org>:

https://gcc.gnu.org/g:1dcc6a1a67165a469d4cd9b6b39514c46cc656ad

commit r15-4270-g1dcc6a1a67165a469d4cd9b6b39514c46cc656ad
Author: Kyrylo Tkachov <ktkac...@nvidia.com>
Date:   Wed Oct 9 09:40:33 2024 -0700

    PR target/117048 aarch64: Use more canonical and optimization-friendly
representation for XAR instruction

    The pattern for the Advanced SIMD XAR instruction isn't very
    optimization-friendly at the moment.
    In the testcase from the PR once simlify-rtx has done its work it
    generates the RTL:
    (set (reg:V2DI 119 [ _14 ])
        (rotate:V2DI (xor:V2DI (reg:V2DI 114 [ vect__1.12_16 ])
                (reg:V2DI 116 [ *m1_01_8(D) ]))
            (const_vector:V2DI [
                    (const_int 32 [0x20]) repeated x2
                ])))

    which fails to match our XAR pattern because the pattern expects:
    1) A ROTATERT instead of the ROTATE.  However, according to the RTL ops
    documentation the preferred form of rotate-by-immediate is ROTATE, which
    I take to mean it's the canonical form.
    ROTATE (x, C) <-> ROTATERT (x, MODE_WIDTH - C) so it's better to match just
    one canonical representation.
    2) A CONST_INT shift amount whereas the midend asks for a repeated vector
    constant.

    These issues are fixed by introducing a dedicated expander for the
    aarch64_xarqv2di name, needed by the arm_neon.h intrinsic, that translate
    the intrinsic-level CONST_INT immediate (the right-rotate amount) into
    a repeated vector constant subtracted from 64 to give the corresponding
    left-rotate amount that is fed to the new representation for the XAR
    define_insn that uses the ROTATE RTL code.  This is a similar approach
    to have we handle the discrepancy between intrinsic-level and RTL-level
    vector lane numbers for big-endian.

    With this patch and [1/2] the arithmetic parts of the testcase now simplify
    to just one XAR instruction.

    Bootstrapped and tested on aarch64-none-linux-gnu.

    Signed-off-by: Kyrylo Tkachov <ktkac...@nvidia.com>

    gcc/
            PR target/117048
            * config/aarch64/aarch64-simd.md (aarch64_xarqv2di): Redefine into
a
            define_expand.
            (*aarch64_xarqv2di_insn): Define.

    gcc/testsuite/
            PR target/117048
            * g++.target/aarch64/pr117048.C: New test.

Reply via email to