Hi all, The MD pattern for the XAR instruction in SVE2 is currently expressed with non-canonical RTL by using a ROTATERT code with a constant rotate amount. Fix it by using the left ROTATE code. This necessitates adjusting the rotate amount during expand.
Additionally, as the SVE2 XAR instruction is unpredicated and can handle all element sizes from .b to .d, it is a good fit for implementing the XOR+ROTATE operation for Advanced SIMD modes where the TARGET_SHA3 cannot be used (that can only handle V2DImode operands). Therefore let's extend the accepted modes of the SVE2 patternt to include the Advanced SIMD integer modes. This leads to some tests for the svxar* intrinsics to fail because they now simplify to a plain EOR when the rotate amount is the width of the element. This simplification is desirable (EOR instructions have better or equal throughput than XAR, and they are non-destructive of their input) so the tests are adjusted. For V2DImode XAR operations we should prefer the Advanced SIMD version when it is available (TARGET_SHA3) because it is non-destructive, so restrict the SVE2 pattern accordingly. Tests are added to confirm this. Bootstrapped and tested on aarch64-none-linux-gnu. Ok for mainline? Thanks, Kyrill Signed-off-by: Kyrylo Tkachov <ktkac...@nvidia.com> gcc/ * config/aarch64/iterators.md (SVE_ASIMD_FULL_I): New mode iterator. * config/aarch64/aarch64-sve2.md (@aarch64_sve2_xar<mode>): Use SVE_ASIMD_FULL_I modes. Use ROTATE code for the rotate step. Adjust output logic. * config/aarch64/aarch64-sve-builtins-sve2.cc (svxar_impl): Define. (svxar): Use the above. gcc/testsuite/ * gcc.target/aarch64/xar_neon_modes.c: New test. * gcc.target/aarch64/xar_v2di_nonsve.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/xar_s16.c: Scan for EOR rather than XAR. * gcc.target/aarch64/sve2/acle/asm/xar_s32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/xar_s64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/xar_s8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/xar_u16.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/xar_u32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/xar_u64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/xar_u8.c: Likewise.
v3-0002-aarch64-Use-canonical-RTL-representation-for-SVE2-XA.patch
Description: v3-0002-aarch64-Use-canonical-RTL-representation-for-SVE2-XA.patch