[PATCH 2/6] aarch64: Use canonical RTL representation for SVE2 XAR and extend it to fixed-width modes

Kyrylo Tkachov Sun, 27 Oct 2024 09:22:15 -0700

Hi all,

The MD pattern for the XAR instruction in SVE2 is currently expressed with
non-canonical RTL by using a ROTATERT code with a constant rotate amount.
Fix it by using the left ROTATE code.  This necessitates adjusting the rotate
amount during expand.


Additionally, as the SVE2 XAR instruction is unpredicated and can handle all
element sizes from .b to .d, it is a good fit for implementing the XOR+ROTATE
operation for Advanced SIMD modes where the TARGET_SHA3 cannot be used
(that can only handle V2DImode operands).  Therefore let's extend the accepted
modes of the SVE2 patternt to include the Advanced SIMD integer modes.

This leads to some tests for the svxar* intrinsics to fail because they now
simplify to a plain EOR when the rotate amount is the width of the element.
This simplification is desirable (EOR instructions have better or equal
throughput than XAR, and they are non-destructive of their input) so the
tests are adjusted.

For V2DImode XAR operations we should prefer the Advanced SIMD version when
it is available (TARGET_SHA3) because it is non-destructive, so restrict the
SVE2 pattern accordingly.  Tests are added to confirm this.

Bootstrapped and tested on aarch64-none-linux-gnu.
Ok for mainline?
Thanks,
Kyrill

Signed-off-by: Kyrylo Tkachov <ktkac...@nvidia.com>

gcc/

        * config/aarch64/iterators.md (SVE_ASIMD_FULL_I): New mode iterator.
        * config/aarch64/aarch64-sve2.md (@aarch64_sve2_xar<mode>):
        Use SVE_ASIMD_FULL_I modes.  Use ROTATE code for the rotate step.
        Adjust output logic.
        * config/aarch64/aarch64-sve-builtins-sve2.cc (svxar_impl): Define.
        (svxar): Use the above.

gcc/testsuite/

        * gcc.target/aarch64/xar_neon_modes.c: New test.
        * gcc.target/aarch64/xar_v2di_nonsve.c: Likewise.
        * gcc.target/aarch64/sve2/acle/asm/xar_s16.c: Scan for EOR rather than
        XAR.
        * gcc.target/aarch64/sve2/acle/asm/xar_s32.c: Likewise.
        * gcc.target/aarch64/sve2/acle/asm/xar_s64.c: Likewise.
        * gcc.target/aarch64/sve2/acle/asm/xar_s8.c: Likewise.
        * gcc.target/aarch64/sve2/acle/asm/xar_u16.c: Likewise.
        * gcc.target/aarch64/sve2/acle/asm/xar_u32.c: Likewise.
        * gcc.target/aarch64/sve2/acle/asm/xar_u64.c: Likewise.
        * gcc.target/aarch64/sve2/acle/asm/xar_u8.c: Likewise.

v3-0002-aarch64-Use-canonical-RTL-representation-for-SVE2-XA.patch
Description: v3-0002-aarch64-Use-canonical-RTL-representation-for-SVE2-XA.patch

[PATCH 2/6] aarch64: Use canonical RTL representation for SVE2 XAR and extend it to fixed-width modes

Reply via email to