Soumya AR <soum...@nvidia.com> writes: > NBSL, BSL1N, and BSL2N are bit-select intructions on SVE2 with certain > operands > inverted. These can be extended to work with Neon modes. > > Since these instructions are unpredicated, duplicate patterns were added with > the predicate removed to generate these instructions for Neon modes. > > The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. > OK for mainline? > > Signed-off-by: Soumya AR <soum...@nvidia.com> > > gcc/ChangeLog: > > * config/aarch64/aarch64-sve2.md > (*aarch64_sve2_nbsl_unpred<mode>): New pattern to match unpredicated > form. > (*aarch64_sve2_bsl1n_unpred<mode>): Likewise. > (*aarch64_sve2_bsl2n_unpred<mode>): Likewise. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/sve/bitsel.c: New test. > > From 4be7a0ba40150322e2f67214f54c55da51ad2fd2 Mon Sep 17 00:00:00 2001 > From: Soumya AR <soum...@nvidia.com> > Date: Mon, 25 Nov 2024 11:25:44 +0530 > Subject: [PATCH] aarch64: Extend SVE2 bit-select instructions for Neon modes. > > NBSL, BSL1N, and BSL2N are bit-select intructions on SVE2 with certain > operands > inverted. These can be extended to work with Neon modes. > > Since these instructions are unpredicated, duplicate patterns were added with > the predicate removed to generate these instructions for Neon modes. > > The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. > OK for mainline? > > Signed-off-by: Soumya AR <soum...@nvidia.com> > > gcc/ChangeLog: > > * config/aarch64/aarch64-sve2.md > (*aarch64_sve2_nbsl_unpred<mode>): New pattern to match unpredicated > form. > (*aarch64_sve2_bsl1n_unpred<mode>): Likewise. > (*aarch64_sve2_bsl2n_unpred<mode>): Likewise. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/sve/bitsel.c: New test.
Looks good, but some comments about the formatting: > --- > gcc/config/aarch64/aarch64-sve2.md | 71 +++++++++++++++++++ > gcc/testsuite/gcc.target/aarch64/sve/bitsel.c | 35 +++++++++ > 2 files changed, 106 insertions(+) > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/bitsel.c > > diff --git a/gcc/config/aarch64/aarch64-sve2.md > b/gcc/config/aarch64/aarch64-sve2.md > index e67421bad84..6d38c40e573 100644 > --- a/gcc/config/aarch64/aarch64-sve2.md > +++ b/gcc/config/aarch64/aarch64-sve2.md > @@ -1696,6 +1696,23 @@ > } > ) > > +(define_insn "*aarch64_sve2_nbsl_unpred<mode>" > + [(set (match_operand:VDQ_I 0 "register_operand") > + (not:VDQ_I > + (xor:VDQ_I > + (and:VDQ_I > + (xor:VDQ_I > + (match_operand:VDQ_I 1 "register_operand") > + (match_operand:VDQ_I 2 "register_operand")) > + (match_operand:VDQ_I 3 "register_operand")) > + (match_dup BSL_DUP))))] > + "TARGET_SVE2" > + {@ [ cons: =0 , 1 , 2 , 3 ; attrs: movprfx ] > + [ w , <bsl_1st> , <bsl_2nd> , w ; * ] nbsl\t%Z0.d, > %Z0.d, %Z<bsl_dup>.d, %Z3.d > + [ ?&w , w , w , w ; yes ] movprfx\t%Z0, > %Z<bsl_mov>\;nbsl\t%Z0.d, %Z0.d, %Z<bsl_dup>.d, %Z3.d > + } > +) > + > ;; Unpredicated bitwise select with inverted first operand. > ;; (op3 ? ~bsl_mov : bsl_dup) == ((~(bsl_mov ^ bsl_dup) & op3) ^ bsl_dup) > (define_expand "@aarch64_sve2_bsl1n<mode>" > @@ -1741,6 +1758,23 @@ > } > ) > > +(define_insn "*aarch64_sve2_bsl1n_unpred<mode>" > + [(set (match_operand:VDQ_I 0 "register_operand") > + (xor:VDQ_I > + (and:VDQ_I > + (not:VDQ_I > + (xor:VDQ_I > + (match_operand:VDQ_I 1 "register_operand") > + (match_operand:VDQ_I 2 "register_operand"))) Removing the unspecs from the SVE patterns leaves these operations indented by an unusual amount. Generally: - if the first operand is on the same line as the operator (as with the set above), the other operands are indented to be underneath the first operand. - if an operator appears on a line by itself, the operands are generally indented by two spaces more than the operator. Same for the others. Also: > + (match_operand:VDQ_I 3 "register_operand")) > + (match_dup BSL_DUP)))] > + "TARGET_SVE2" > + {@ [ cons: =0 , 1 , 2 , 3 ; attrs: movprfx ] > + [ w , <bsl_1st> , <bsl_2nd> , w ; * ] bsl1n\t%Z0.d, > %Z0.d, %Z<bsl_dup>.d, %Z3.d > + [ ?&w , w , w , w ; yes ] movprfx\t%Z0, > %Z<bsl_mov>\;bsl1n\t%Z0.d, %Z0.d, %Z<bsl_dup>.d, %Z3.d > + } > +) > + > ;; Unpredicated bitwise select with inverted second operand. > ;; (bsl_dup ? bsl_mov : ~op3) == ((bsl_dup & bsl_mov) | (~op3 & ~bsl_dup)) > (define_expand "@aarch64_sve2_bsl2n<mode>" > @@ -1815,6 +1849,43 @@ > } > ) > > +(define_insn "*aarch64_sve2_bsl2n_unpred<mode>" > + [(set (match_operand:VDQ_I 0 "register_operand") > + (ior:VDQ_I > + (and:VDQ_I > + (match_operand:VDQ_I 1 "register_operand") > + (match_operand:VDQ_I 2 "register_operand")) > + (and:VDQ_I > + (not:VDQ_I > + (match_operand:VDQ_I 3 "register_operand")) > + (not:VDQ_I > + (match_dup BSL_DUP))) > + ))] ...all the close brackets would normally be on the same line as the match_dup. Elsewhere I said: ...the work seems like an ongoing project with no well-defined end point, so it seemed like the GCC 15 cut-off would have to be time-driven rather than feature-driven. But another valid cut-off would be to take what has already been posted. So let's do that :) Please repost the reformatted patch for review though. Thanks, Richard > + "TARGET_SVE2" > + {@ [ cons: =0 , 1 , 2 , 3 ; attrs: movprfx ] > + [ w , <bsl_1st> , <bsl_2nd> , w ; * ] bsl2n\t%Z0.d, > %Z0.d, %Z3.d, %Z<bsl_dup>.d > + [ ?&w , w , w , w ; yes ] movprfx\t%Z0, > %Z<bsl_mov>\;bsl2n\t%Z0.d, %Z0.d, %Z3.d, %Z<bsl_dup>.d > + } > +) > + > +(define_insn "*aarch64_sve2_bsl2n_unpred<mode>" > + [(set (match_operand:VDQ_I 0 "register_operand") > + (ior:VDQ_I > + (and:VDQ_I > + (match_operand:VDQ_I 1 "register_operand") > + (match_operand:VDQ_I 2 "register_operand")) > + (and:VDQ_I > + (not:VDQ_I > + (match_dup BSL_DUP)) > + (not:VDQ_I > + (match_operand:VDQ_I 3 "register_operand")))))] > + "TARGET_SVE2" > + {@ [ cons: =0 , 1 , 2 , 3 ; attrs: movprfx ] > + [ w , <bsl_1st> , <bsl_2nd> , w ; * ] bsl2n\t%Z0.d, > %Z0.d, %Z3.d, %Z<bsl_dup>.d > + [ ?&w , w , w , w ; yes ] movprfx\t%Z0, > %Z<bsl_mov>\;bsl2n\t%Z0.d, %Z0.d, %Z3.d, %Z<bsl_dup>.d > + } > +) > + > ;; ------------------------------------------------------------------------- > ;; ---- [INT] Shift-and-accumulate operations > ;; ------------------------------------------------------------------------- > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/bitsel.c > b/gcc/testsuite/gcc.target/aarch64/sve/bitsel.c > new file mode 100644 > index 00000000000..635bfefc17c > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/bitsel.c > @@ -0,0 +1,35 @@ > +/* { dg-options "-O2 -mcpu=neoverse-v2 --param > aarch64-autovec-preference=asimd-only" } */ > + > +#include <stdint.h> > + > +#define OPNBSL(x,y,z) (~(((x) & (z)) | ((y) & ~(z)))) > +#define OPBSL1N(x,y,z) ((~(x) & (z)) | ((y) & ~(z))) > +#define OPBSL2N(x,y,z) (((x) & (z)) | (~(y) & ~(z))) > + > +#define N 1024 > + > +#define TYPE(N) int##N##_t > + > +#define TEST(SIZE, OP, SUFFIX) \ > +void __attribute__ ((noinline, noclone)) \ > +f_##SIZE##_##SUFFIX \ > + (TYPE(SIZE) *restrict a, TYPE(SIZE) *restrict b, \ > + TYPE(SIZE) *restrict c, TYPE(SIZE) *restrict d) \ > +{ \ > + for (int i = 0; i < N; i++) \ > + a[i] = OP (b[i], c[i], d[i]); \ > +} > + > +#define TEST_ALL(SIZE) \ > + TEST(SIZE, OPNBSL, nbsl) \ > + TEST(SIZE, OPBSL1N, bsl1n) \ > + TEST(SIZE, OPBSL2N, bsl2n) > + > +TEST_ALL(8); > +TEST_ALL(16); > +TEST_ALL(32); > +TEST_ALL(64); > + > +/* { dg-final { scan-assembler-times {\tnbsl\tz[0-9]+\.d, z[0-9]+\.d, > z[0-9]+\.d, z[0-9]+\.d\n} 4 } } */ > +/* { dg-final { scan-assembler-times {\tbsl1n\tz[0-9]+\.d, z[0-9]+\.d, > z[0-9]+\.d, z[0-9]+\.d\n} 4 } } */ > +/* { dg-final { scan-assembler-times {\tbsl2n\tz[0-9]+\.d, z[0-9]+\.d, > z[0-9]+\.d, z[0-9]+\.d\n} 4 } } */ > \ No newline at end of file