Kyrylo Tkachov <ktkac...@nvidia.com> writes: > Hi all, > > The MD pattern for the XAR instruction in SVE2 is currently expressed with > non-canonical RTL by using a ROTATERT code with a constant rotate amount. > Fix it by using the left ROTATE code. This necessitates splitting out the > expander separately to translate the immediate coming from the intrinsic > from a right-rotate to a left-rotate immediate.
Could we instead do the translation in aarch64-sve-builtins-sve2.cc? It should be simpler to adjust there, by modifying the function_expander's args array. > Additionally, as the SVE2 XAR instruction is unpredicated and can handle all > element sizes from .b to .d, it is a good fit for implementing the XOR+ROTATE > operation for Advanced SIMD modes where the TARGET_SHA3 cannot be used > (that can only handle V2DImode operands). Therefore let's extend the accepted > modes of the SVE2 patternt to include the 128-bit Advanced SIMD integer modes. As mentioned in other reply that I sent out-of-order, I think we could also include the 64-bit modes. LGTM otherwise FWIW. Thanks, Richard > > This leads to some tests for the svxar* intrinsics to fail because they now > simplify to a plain EOR when the rotate amount is the width of the element. > This simplification is desirable (EOR instructions have better or equal > throughput than XAR, and they are non-destructive of their input) so the > tests are adjusted. > > For V2DImode XAR operations we should prefer the Advanced SIMD version when > it is available (TARGET_SHA3) because it is non-destructive, so restrict the > SVE2 pattern accordingly. Tests are added to confirm this. > > Bootstrapped and tested on aarch64-none-linux-gnu. > Ok for mainline? > > Signed-off-by: Kyrylo Tkachov <ktkac...@nvidia.com> > > gcc/ > > * config/aarch64/iterators.md (SVE_ASIMD_FULL_I): New mode iterator. > * config/aarch64/aarch64-sve2.md (@aarch64_sve2_xar<mode>): Rename > to... > (*aarch64_sve2_xar<mode>_insn): ... This. Use SVE_ASIMD_FULL_I > iterator and adjust output logic. > (@aarch64_sve2_xar<mode>): New define_expand. > > gcc/testsuite/ > > * gcc.target/aarch64/xar_neon_modes.c: New test. > * gcc.target/aarch64/xar_v2di_nonsve.c: Likewise. > * gcc.target/aarch64/sve2/acle/asm/xar_s16.c: Scan for EOR rather than > XAR. > * gcc.target/aarch64/sve2/acle/asm/xar_s32.c: Likewise. > * gcc.target/aarch64/sve2/acle/asm/xar_s64.c: Likewise. > * gcc.target/aarch64/sve2/acle/asm/xar_s8.c: Likewise. > * gcc.target/aarch64/sve2/acle/asm/xar_u16.c: Likewise. > * gcc.target/aarch64/sve2/acle/asm/xar_u32.c: Likewise. > * gcc.target/aarch64/sve2/acle/asm/xar_u64.c: Likewise. > * gcc.target/aarch64/sve2/acle/asm/xar_u8.c: Likewise. > > From 41a7b2bfe69d7fc716b5da969d19185885c6b2bf Mon Sep 17 00:00:00 2001 > From: Kyrylo Tkachov <ktkac...@nvidia.com> > Date: Tue, 22 Oct 2024 03:27:47 -0700 > Subject: [PATCH 2/6] aarch64: Use canonical RTL representation for SVE2 XAR > and extend it to fixed-width modes > > The MD pattern for the XAR instruction in SVE2 is currently expressed with > non-canonical RTL by using a ROTATERT code with a constant rotate amount. > Fix it by using the left ROTATE code. This necessitates splitting out the > expander separately to translate the immediate coming from the intrinsic > from a right-rotate to a left-rotate immediate. > > Additionally, as the SVE2 XAR instruction is unpredicated and can handle all > element sizes from .b to .d, it is a good fit for implementing the XOR+ROTATE > operation for Advanced SIMD modes where the TARGET_SHA3 cannot be used > (that can only handle V2DImode operands). Therefore let's extend the accepted > modes of the SVE2 patternt to include the 128-bit Advanced SIMD integer modes. > > This leads to some tests for the svxar* intrinsics to fail because they now > simplify to a plain EOR when the rotate amount is the width of the element. > This simplification is desirable (EOR instructions have better or equal > throughput than XAR, and they are non-destructive of their input) so the > tests are adjusted. > > For V2DImode XAR operations we should prefer the Advanced SIMD version when > it is available (TARGET_SHA3) because it is non-destructive, so restrict the > SVE2 pattern accordingly. Tests are added to confirm this. > > Bootstrapped and tested on aarch64-none-linux-gnu. > Ok for mainline? > > Signed-off-by: Kyrylo Tkachov <ktkac...@nvidia.com> > > gcc/ > > * config/aarch64/iterators.md (SVE_ASIMD_FULL_I): New mode iterator. > * config/aarch64/aarch64-sve2.md (@aarch64_sve2_xar<mode>): Rename > to... > (*aarch64_sve2_xar<mode>_insn): ... This. Use SVE_ASIMD_FULL_I > iterator and adjust output logic. > (@aarch64_sve2_xar<mode>): New define_expand. > > gcc/testsuite/ > > * gcc.target/aarch64/xar_neon_modes.c: New test. > * gcc.target/aarch64/xar_v2di_nonsve.c: Likewise. > * gcc.target/aarch64/sve2/acle/asm/xar_s16.c: Scan for EOR rather than > XAR. > * gcc.target/aarch64/sve2/acle/asm/xar_s32.c: Likewise. > * gcc.target/aarch64/sve2/acle/asm/xar_s64.c: Likewise. > * gcc.target/aarch64/sve2/acle/asm/xar_s8.c: Likewise. > * gcc.target/aarch64/sve2/acle/asm/xar_u16.c: Likewise. > * gcc.target/aarch64/sve2/acle/asm/xar_u32.c: Likewise. > * gcc.target/aarch64/sve2/acle/asm/xar_u64.c: Likewise. > * gcc.target/aarch64/sve2/acle/asm/xar_u8.c: Likewise. > --- > gcc/config/aarch64/aarch64-sve2.md | 39 ++++++++++++++++--- > gcc/config/aarch64/iterators.md | 3 ++ > .../aarch64/sve2/acle/asm/xar_s16.c | 18 ++++++--- > .../aarch64/sve2/acle/asm/xar_s32.c | 18 ++++++--- > .../aarch64/sve2/acle/asm/xar_s64.c | 18 ++++++--- > .../gcc.target/aarch64/sve2/acle/asm/xar_s8.c | 18 ++++++--- > .../aarch64/sve2/acle/asm/xar_u16.c | 18 ++++++--- > .../aarch64/sve2/acle/asm/xar_u32.c | 18 ++++++--- > .../aarch64/sve2/acle/asm/xar_u64.c | 18 ++++++--- > .../gcc.target/aarch64/sve2/acle/asm/xar_u8.c | 18 ++++++--- > .../gcc.target/aarch64/xar_neon_modes.c | 39 +++++++++++++++++++ > .../gcc.target/aarch64/xar_v2di_nonsve.c | 16 ++++++++ > 12 files changed, 188 insertions(+), 53 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/aarch64/xar_neon_modes.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/xar_v2di_nonsve.c > > diff --git a/gcc/config/aarch64/aarch64-sve2.md > b/gcc/config/aarch64/aarch64-sve2.md > index 5f2697c3179..993fc5ebbe0 100644 > --- a/gcc/config/aarch64/aarch64-sve2.md > +++ b/gcc/config/aarch64/aarch64-sve2.md > @@ -1266,17 +1266,46 @@ > ;; - XAR > ;; ------------------------------------------------------------------------- > > -(define_insn "@aarch64_sve2_xar<mode>" > +;; Also allow the 128-bit Advanced SIMD modes as the the SVE2 XAR instruction > +;; can handle more element sizes than the TARGET_SHA3 one from Advanced SIMD. > +;; Don't allow the V2DImode use here unless !TARGET_SHA3 as the Advanced SIMD > +;; version should be preferred when available as it is non-destructive on its > +;; input. > +(define_insn "*aarch64_sve2_xar<mode>_insn" > + [(set (match_operand:SVE_ASIMD_FULL_I 0 "register_operand" "=w,?&w") > + (rotate:SVE_ASIMD_FULL_I > + (xor:SVE_ASIMD_FULL_I > + (match_operand:SVE_ASIMD_FULL_I 1 "register_operand" "%0,w") > + (match_operand:SVE_ASIMD_FULL_I 2 "register_operand" "w,w")) > + (match_operand:SVE_ASIMD_FULL_I 3 "aarch64_simd_lshift_imm")))] > + "TARGET_SVE2 && !(<MODE>mode == V2DImode && TARGET_SHA3)" > + { > + operands[3] > + = GEN_INT (GET_MODE_UNIT_BITSIZE (<MODE>mode) > + - INTVAL (unwrap_const_vec_duplicate (operands[3]))); > + if (which_alternative == 0) > + return "xar\t%Z0.<Vetype>, %Z0.<Vetype>, %Z2.<Vetype>, #%3"; > + return "movprfx\t%Z0, %Z1\;xar\t%Z0.<Vetype>, %Z0.<Vetype>, > %Z2.<Vetype>, #%3"; > + } > + [(set_attr "movprfx" "*,yes")] > +) > + > +;; Translate the rotate right amount from the intrinsic semantics to the > +;; canonical rotate left RTL amount. > +(define_expand "@aarch64_sve2_xar<mode>" > [(set (match_operand:SVE_FULL_I 0 "register_operand") > - (rotatert:SVE_FULL_I > + (rotate:SVE_FULL_I > (xor:SVE_FULL_I > (match_operand:SVE_FULL_I 1 "register_operand") > (match_operand:SVE_FULL_I 2 "register_operand")) > (match_operand:SVE_FULL_I 3 "aarch64_simd_rshift_imm")))] > "TARGET_SVE2" > - {@ [ cons: =0 , 1 , 2 ; attrs: movprfx ] > - [ w , %0 , w ; * ] xar\t%0.<Vetype>, %0.<Vetype>, > %2.<Vetype>, #%3 > - [ ?&w , w , w ; yes ] movprfx\t%0, > %1\;xar\t%0.<Vetype>, %0.<Vetype>, %2.<Vetype>, #%3 > + { > + HOST_WIDE_INT rotrt = INTVAL (unwrap_const_vec_duplicate (operands[3])); > + operands[3] > + = aarch64_simd_gen_const_vector_dup (<MODE>mode, > + GET_MODE_UNIT_BITSIZE (<MODE>mode) > + - rotrt); > } > ) > > diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md > index efba78375c2..95ad163ec07 100644 > --- a/gcc/config/aarch64/iterators.md > +++ b/gcc/config/aarch64/iterators.md > @@ -444,6 +444,9 @@ > ;; All fully-packed SVE integer vector modes. > (define_mode_iterator SVE_FULL_I [VNx16QI VNx8HI VNx4SI VNx2DI]) > > +;; All fully-packed SVE integer and Advanced SIMD quad integer modes. > +(define_mode_iterator SVE_ASIMD_FULL_I [SVE_FULL_I VQ_I]) > + > ;; All fully-packed SVE floating-point vector modes. > (define_mode_iterator SVE_FULL_F [VNx8HF VNx4SF VNx2DF]) > > diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/xar_s16.c > b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/xar_s16.c > index 34351d52718..f69ba3f7b06 100644 > --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/xar_s16.c > +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/xar_s16.c > @@ -70,7 +70,11 @@ TEST_UNIFORM_Z (xar_2_s16_untied, svint16_t, > > /* > ** xar_16_s16_tied1: > -** xar z0\.h, z0\.h, z1\.h, #16 > +** ( > +** eor z0\.d, z1\.d, z0\.d > +** | > +** eor z0\.d, z0\.d, z1\.d > +** ) > ** ret > */ > TEST_UNIFORM_Z (xar_16_s16_tied1, svint16_t, > @@ -79,7 +83,11 @@ TEST_UNIFORM_Z (xar_16_s16_tied1, svint16_t, > > /* > ** xar_16_s16_tied2: > -** xar z0\.h, z0\.h, z1\.h, #16 > +** ( > +** eor z0\.d, z1\.d, z0\.d > +** | > +** eor z0\.d, z0\.d, z1\.d > +** ) > ** ret > */ > TEST_UNIFORM_Z (xar_16_s16_tied2, svint16_t, > @@ -89,11 +97,9 @@ TEST_UNIFORM_Z (xar_16_s16_tied2, svint16_t, > /* > ** xar_16_s16_untied: > ** ( > -** movprfx z0, z1 > -** xar z0\.h, z0\.h, z2\.h, #16 > +** eor z0\.d, z1\.d, z2\.d > ** | > -** movprfx z0, z2 > -** xar z0\.h, z0\.h, z1\.h, #16 > +** eor z0\.d, z2\.d, z1\.d > ** ) > ** ret > */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/xar_s32.c > b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/xar_s32.c > index 366a6172807..540f7b875ec 100644 > --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/xar_s32.c > +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/xar_s32.c > @@ -70,7 +70,11 @@ TEST_UNIFORM_Z (xar_2_s32_untied, svint32_t, > > /* > ** xar_32_s32_tied1: > -** xar z0\.s, z0\.s, z1\.s, #32 > +** ( > +** eor z0\.d, z1\.d, z0\.d > +** | > +** eor z0\.d, z0\.d, z1\.d > +** ) > ** ret > */ > TEST_UNIFORM_Z (xar_32_s32_tied1, svint32_t, > @@ -79,7 +83,11 @@ TEST_UNIFORM_Z (xar_32_s32_tied1, svint32_t, > > /* > ** xar_32_s32_tied2: > -** xar z0\.s, z0\.s, z1\.s, #32 > +** ( > +** eor z0\.d, z0\.d, z1\.d > +** | > +** eor z0\.d, z1\.d, z0\.d > +** ) > ** ret > */ > TEST_UNIFORM_Z (xar_32_s32_tied2, svint32_t, > @@ -89,11 +97,9 @@ TEST_UNIFORM_Z (xar_32_s32_tied2, svint32_t, > /* > ** xar_32_s32_untied: > ** ( > -** movprfx z0, z1 > -** xar z0\.s, z0\.s, z2\.s, #32 > +** eor z0\.d, z1\.d, z2\.d > ** | > -** movprfx z0, z2 > -** xar z0\.s, z0\.s, z1\.s, #32 > +** eor z0\.d, z2\.d, z1\.d > ** ) > ** ret > */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/xar_s64.c > b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/xar_s64.c > index dedda2ed044..9491dbdb848 100644 > --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/xar_s64.c > +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/xar_s64.c > @@ -70,7 +70,11 @@ TEST_UNIFORM_Z (xar_2_s64_untied, svint64_t, > > /* > ** xar_64_s64_tied1: > -** xar z0\.d, z0\.d, z1\.d, #64 > +** ( > +** eor z0\.d, z1\.d, z0\.d > +** | > +** eor z0\.d, z0\.d, z1\.d > +** ) > ** ret > */ > TEST_UNIFORM_Z (xar_64_s64_tied1, svint64_t, > @@ -79,7 +83,11 @@ TEST_UNIFORM_Z (xar_64_s64_tied1, svint64_t, > > /* > ** xar_64_s64_tied2: > -** xar z0\.d, z0\.d, z1\.d, #64 > +** ( > +** eor z0\.d, z1\.d, z0\.d > +** | > +** eor z0\.d, z0\.d, z1\.d > +** ) > ** ret > */ > TEST_UNIFORM_Z (xar_64_s64_tied2, svint64_t, > @@ -89,11 +97,9 @@ TEST_UNIFORM_Z (xar_64_s64_tied2, svint64_t, > /* > ** xar_64_s64_untied: > ** ( > -** movprfx z0, z1 > -** xar z0\.d, z0\.d, z2\.d, #64 > +** eor z0\.d, z1\.d, z2\.d > ** | > -** movprfx z0, z2 > -** xar z0\.d, z0\.d, z1\.d, #64 > +** eor z0\.d, z2\.d, z1\.d > ** ) > ** ret > */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/xar_s8.c > b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/xar_s8.c > index 904352b93da..e62e5bca5ba 100644 > --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/xar_s8.c > +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/xar_s8.c > @@ -70,7 +70,11 @@ TEST_UNIFORM_Z (xar_2_s8_untied, svint8_t, > > /* > ** xar_8_s8_tied1: > -** xar z0\.b, z0\.b, z1\.b, #8 > +** ( > +** eor z0\.d, z1\.d, z0\.d > +** | > +** eor z0\.d, z0\.d, z1\.d > +** ) > ** ret > */ > TEST_UNIFORM_Z (xar_8_s8_tied1, svint8_t, > @@ -79,7 +83,11 @@ TEST_UNIFORM_Z (xar_8_s8_tied1, svint8_t, > > /* > ** xar_8_s8_tied2: > -** xar z0\.b, z0\.b, z1\.b, #8 > +** ( > +** eor z0\.d, z1\.d, z0\.d > +** | > +** eor z0\.d, z0\.d, z1\.d > +** ) > ** ret > */ > TEST_UNIFORM_Z (xar_8_s8_tied2, svint8_t, > @@ -89,11 +97,9 @@ TEST_UNIFORM_Z (xar_8_s8_tied2, svint8_t, > /* > ** xar_8_s8_untied: > ** ( > -** movprfx z0, z1 > -** xar z0\.b, z0\.b, z2\.b, #8 > +** eor z0\.d, z1\.d, z2\.d > ** | > -** movprfx z0, z2 > -** xar z0\.b, z0\.b, z1\.b, #8 > +** eor z0\.d, z2\.d, z1\.d > ** ) > ** ret > */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/xar_u16.c > b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/xar_u16.c > index c7b9665aeed..6269145bc6d 100644 > --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/xar_u16.c > +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/xar_u16.c > @@ -70,7 +70,11 @@ TEST_UNIFORM_Z (xar_2_u16_untied, svuint16_t, > > /* > ** xar_16_u16_tied1: > -** xar z0\.h, z0\.h, z1\.h, #16 > +** ( > +** eor z0\.d, z1\.d, z0\.d > +** | > +** eor z0\.d, z0\.d, z1\.d > +** ) > ** ret > */ > TEST_UNIFORM_Z (xar_16_u16_tied1, svuint16_t, > @@ -79,7 +83,11 @@ TEST_UNIFORM_Z (xar_16_u16_tied1, svuint16_t, > > /* > ** xar_16_u16_tied2: > -** xar z0\.h, z0\.h, z1\.h, #16 > +** ( > +** eor z0\.d, z1\.d, z0\.d > +** | > +** eor z0\.d, z0\.d, z1\.d > +** ) > ** ret > */ > TEST_UNIFORM_Z (xar_16_u16_tied2, svuint16_t, > @@ -89,11 +97,9 @@ TEST_UNIFORM_Z (xar_16_u16_tied2, svuint16_t, > /* > ** xar_16_u16_untied: > ** ( > -** movprfx z0, z1 > -** xar z0\.h, z0\.h, z2\.h, #16 > +** eor z0\.d, z1\.d, z2\.d > ** | > -** movprfx z0, z2 > -** xar z0\.h, z0\.h, z1\.h, #16 > +** eor z0\.d, z2\.d, z1\.d > ** ) > ** ret > */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/xar_u32.c > b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/xar_u32.c > index 115ead7701c..99efd14e1ed 100644 > --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/xar_u32.c > +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/xar_u32.c > @@ -70,7 +70,11 @@ TEST_UNIFORM_Z (xar_2_u32_untied, svuint32_t, > > /* > ** xar_32_u32_tied1: > -** xar z0\.s, z0\.s, z1\.s, #32 > +** ( > +** eor z0\.d, z1\.d, z0\.d > +** | > +** eor z0\.d, z0\.d, z1\.d > +** ) > ** ret > */ > TEST_UNIFORM_Z (xar_32_u32_tied1, svuint32_t, > @@ -79,7 +83,11 @@ TEST_UNIFORM_Z (xar_32_u32_tied1, svuint32_t, > > /* > ** xar_32_u32_tied2: > -** xar z0\.s, z0\.s, z1\.s, #32 > +** ( > +** eor z0\.d, z1\.d, z0\.d > +** | > +** eor z0\.d, z0\.d, z1\.d > +** ) > ** ret > */ > TEST_UNIFORM_Z (xar_32_u32_tied2, svuint32_t, > @@ -89,11 +97,9 @@ TEST_UNIFORM_Z (xar_32_u32_tied2, svuint32_t, > /* > ** xar_32_u32_untied: > ** ( > -** movprfx z0, z1 > -** xar z0\.s, z0\.s, z2\.s, #32 > +** eor z0\.d, z1\.d, z2\.d > ** | > -** movprfx z0, z2 > -** xar z0\.s, z0\.s, z1\.s, #32 > +** eor z0\.d, z2\.d, z1\.d > ** ) > ** ret > */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/xar_u64.c > b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/xar_u64.c > index 1d0d90e90d6..5c770ffdadb 100644 > --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/xar_u64.c > +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/xar_u64.c > @@ -70,7 +70,11 @@ TEST_UNIFORM_Z (xar_2_u64_untied, svuint64_t, > > /* > ** xar_64_u64_tied1: > -** xar z0\.d, z0\.d, z1\.d, #64 > +** ( > +** eor z0\.d, z1\.d, z0\.d > +** | > +** eor z0\.d, z0\.d, z1\.d > +** ) > ** ret > */ > TEST_UNIFORM_Z (xar_64_u64_tied1, svuint64_t, > @@ -79,7 +83,11 @@ TEST_UNIFORM_Z (xar_64_u64_tied1, svuint64_t, > > /* > ** xar_64_u64_tied2: > -** xar z0\.d, z0\.d, z1\.d, #64 > +** ( > +** eor z0\.d, z1\.d, z0\.d > +** | > +** eor z0\.d, z0\.d, z1\.d > +** ) > ** ret > */ > TEST_UNIFORM_Z (xar_64_u64_tied2, svuint64_t, > @@ -89,11 +97,9 @@ TEST_UNIFORM_Z (xar_64_u64_tied2, svuint64_t, > /* > ** xar_64_u64_untied: > ** ( > -** movprfx z0, z1 > -** xar z0\.d, z0\.d, z2\.d, #64 > +** eor z0\.d, z1\.d, z2\.d > ** | > -** movprfx z0, z2 > -** xar z0\.d, z0\.d, z1\.d, #64 > +** eor z0\.d, z2\.d, z1\.d > ** ) > ** ret > */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/xar_u8.c > b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/xar_u8.c > index 3b6161729cb..5ae5323a08a 100644 > --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/xar_u8.c > +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/xar_u8.c > @@ -70,7 +70,11 @@ TEST_UNIFORM_Z (xar_2_u8_untied, svuint8_t, > > /* > ** xar_8_u8_tied1: > -** xar z0\.b, z0\.b, z1\.b, #8 > +** ( > +** eor z0\.d, z1\.d, z0\.d > +** | > +** eor z0\.d, z0\.d, z1\.d > +** ) > ** ret > */ > TEST_UNIFORM_Z (xar_8_u8_tied1, svuint8_t, > @@ -79,7 +83,11 @@ TEST_UNIFORM_Z (xar_8_u8_tied1, svuint8_t, > > /* > ** xar_8_u8_tied2: > -** xar z0\.b, z0\.b, z1\.b, #8 > +** ( > +** eor z0\.d, z1\.d, z0\.d > +** | > +** eor z0\.d, z0\.d, z1\.d > +** ) > ** ret > */ > TEST_UNIFORM_Z (xar_8_u8_tied2, svuint8_t, > @@ -89,11 +97,9 @@ TEST_UNIFORM_Z (xar_8_u8_tied2, svuint8_t, > /* > ** xar_8_u8_untied: > ** ( > -** movprfx z0, z1 > -** xar z0\.b, z0\.b, z2\.b, #8 > +** eor z0\.d, z1\.d, z2\.d > ** | > -** movprfx z0, z2 > -** xar z0\.b, z0\.b, z1\.b, #8 > +** eor z0\.d, z2\.d, z1\.d > ** ) > ** ret > */ > diff --git a/gcc/testsuite/gcc.target/aarch64/xar_neon_modes.c > b/gcc/testsuite/gcc.target/aarch64/xar_neon_modes.c > new file mode 100644 > index 00000000000..750fbcfc48a > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/xar_neon_modes.c > @@ -0,0 +1,39 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > + > +#pragma GCC target "+sve2+nosha3" > + > +typedef char __attribute__ ((vector_size (16))) v16qi; > +typedef unsigned short __attribute__ ((vector_size (16))) v8hi; > +typedef unsigned int __attribute__ ((vector_size (16))) v4si; > +typedef unsigned long long __attribute__ ((vector_size (16))) v2di; > + > +v16qi > +xar_v16qi (v16qi a, v16qi b) { > + v16qi c = a ^ b; > + return (c << 2) ^ (c >> 6); > +} > +/* { dg-final { scan-assembler {\txar\tz0.b, z[0-9]+.b, z[0-9]+.b, #6} } } */ > + > +v8hi > +xar_v8hi (v8hi a, v8hi b) { > + v8hi c = a ^ b; > + return (c << 13) ^ (c >> 3); > +} > +/* { dg-final { scan-assembler {\txar\tz0.h, z[0-9]+.h, z[0-9]+.h, #3} } } */ > + > +v4si > +xar_v4si (v4si a, v4si b) { > + v4si c = a ^ b; > + return (c << 9) ^ (c >> 23); > +} > +/* { dg-final { scan-assembler {\txar\tz0.s, z[0-9]+.s, z[0-9]+.s, #23} } } > */ > + > +/* When +sha3 for Advanced SIMD is not available we should still use the > + SVE2 form of XAR. */ > +v2di > +xar_v2di (v2di a, v2di b) { > + v2di c = a ^ b; > + return (c << 22) ^ (c >> 42); > +} > +/* { dg-final { scan-assembler {\txar\tz0.d, z[0-9]+.d, z[0-9]+.d, #42} } } > */ > diff --git a/gcc/testsuite/gcc.target/aarch64/xar_v2di_nonsve.c > b/gcc/testsuite/gcc.target/aarch64/xar_v2di_nonsve.c > new file mode 100644 > index 00000000000..b0f1a97222b > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/xar_v2di_nonsve.c > @@ -0,0 +1,16 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > + > +#pragma GCC target "+sve2+sha3" > + > +typedef unsigned long long __attribute__ ((vector_size (16))) v2di; > + > +/* Both +sve2 and +sha3 have V2DImode XAR instructions, but we should > + prefer the Advanced SIMD one when both are available. */ > +v2di > +xar_v2di (v2di a, v2di b) { > + v2di c = a ^ b; > + return (c << 22) ^ (c >> 42); > +} > +/* { dg-final { scan-assembler {\txar\tv0.2d, v[0-9]+.2d, v[0-9]+.2d, 42} } > } */ > +