[PATCH 2/6] aarch64: Use canonical RTL representation for SVE2 XAR and extend it to fixed-width modes

Kyrylo Tkachov Tue, 22 Oct 2024 13:27:35 -0700

Hi all,

The MD pattern for the XAR instruction in SVE2 is currently expressed with
non-canonical RTL by using a ROTATERT code with a constant rotate amount.
Fix it by using the left ROTATE code.  This necessitates splitting out the
expander separately to translate the immediate coming from the intrinsic
from a right-rotate to a left-rotate immediate.


Additionally, as the SVE2 XAR instruction is unpredicated and can handle all
element sizes from .b to .d, it is a good fit for implementing the XOR+ROTATE
operation for Advanced SIMD modes where the TARGET_SHA3 cannot be used
(that can only handle V2DImode operands).  Therefore let's extend the accepted
modes of the SVE2 patternt to include the 128-bit Advanced SIMD integer modes.

This leads to some tests for the svxar* intrinsics to fail because they now
simplify to a plain EOR when the rotate amount is the width of the element.
This simplification is desirable (EOR instructions have better or equal
throughput than XAR, and they are non-destructive of their input) so the
tests are adjusted.

For V2DImode XAR operations we should prefer the Advanced SIMD version when
it is available (TARGET_SHA3) because it is non-destructive, so restrict the
SVE2 pattern accordingly.  Tests are added to confirm this.

Bootstrapped and tested on aarch64-none-linux-gnu.
Ok for mainline?

Signed-off-by: Kyrylo Tkachov <ktkac...@nvidia.com>

gcc/

        * config/aarch64/iterators.md (SVE_ASIMD_FULL_I): New mode iterator.
        * config/aarch64/aarch64-sve2.md (@aarch64_sve2_xar<mode>): Rename
        to...
        (*aarch64_sve2_xar<mode>_insn): ... This.  Use SVE_ASIMD_FULL_I
        iterator and adjust output logic.
        (@aarch64_sve2_xar<mode>): New define_expand.

gcc/testsuite/

        * gcc.target/aarch64/xar_neon_modes.c: New test.
        * gcc.target/aarch64/xar_v2di_nonsve.c: Likewise.
        * gcc.target/aarch64/sve2/acle/asm/xar_s16.c: Scan for EOR rather than
        XAR.
        * gcc.target/aarch64/sve2/acle/asm/xar_s32.c: Likewise.
        * gcc.target/aarch64/sve2/acle/asm/xar_s64.c: Likewise.
        * gcc.target/aarch64/sve2/acle/asm/xar_s8.c: Likewise.
        * gcc.target/aarch64/sve2/acle/asm/xar_u16.c: Likewise.
        * gcc.target/aarch64/sve2/acle/asm/xar_u32.c: Likewise.
        * gcc.target/aarch64/sve2/acle/asm/xar_u64.c: Likewise.
        * gcc.target/aarch64/sve2/acle/asm/xar_u8.c: Likewise.

v2-0002-aarch64-Use-canonical-RTL-representation-for-SVE2-XA.patch
Description: v2-0002-aarch64-Use-canonical-RTL-representation-for-SVE2-XA.patch

[PATCH 2/6] aarch64: Use canonical RTL representation for SVE2 XAR and extend it to fixed-width modes

Reply via email to