Soumya AR <soum...@nvidia.com> writes: > Currently, we vectorize CTZ for SVE by using the following operation: > .CTZ (X) = (PREC - 1) - .CLZ (X & -X) > > Instead, this patch expands CTZ to RBIT + CLZ for SVE, as suggested in > PR109498. > > The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. > OK for mainline? > > Signed-off-by: Soumya AR <soum...@nvidia.com> > > gcc/ChangeLog: > PR target/109498 > * config/aarch64/aarch64-sve.md (ctz<mode>2): Added pattern to expand > CTZ to RBIT + CLZ for SVE. > > gcc/testsuite/ChangeLog: > PR target/109498 > * gcc.target/aarch64/sve/ctz.c: New test.
Generally looks good, but a couple of comments: > --- > gcc/config/aarch64/aarch64-sve.md | 16 +++++++ > gcc/testsuite/gcc.target/aarch64/sve/ctz.c | 49 ++++++++++++++++++++++ > 2 files changed, 65 insertions(+) > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/ctz.c > > diff --git a/gcc/config/aarch64/aarch64-sve.md > b/gcc/config/aarch64/aarch64-sve.md > index bfa28849adf..10094f156b3 100644 > --- a/gcc/config/aarch64/aarch64-sve.md > +++ b/gcc/config/aarch64/aarch64-sve.md > @@ -3088,6 +3088,22 @@ > ;; - NOT > ;; ------------------------------------------------------------------------- > > +(define_expand "ctz<mode>2" > + [(set (match_operand:SVE_I 0 "register_operand") > + (unspec:SVE_I > + [(match_dup 2) > + (ctz:SVE_I > + (match_operand:SVE_I 1 "register_operand"))] > + UNSPEC_PRED_X))] > + "TARGET_SVE" > + { > + operands[2] = aarch64_ptrue_reg (<VPRED>mode); There's no real need to use operands[...] here. It can just be a local variable. > + emit_insn (gen_aarch64_pred_rbit<mode> (operands[0], > operands[2],operands[1])); > + emit_insn (gen_aarch64_pred_clz<mode> (operands[0], operands[2], > operands[0])); Formatting nit: C++ lines should be 80 characters or fewer. More importantly, I think we should use a fresh register for the temporary (RBIT) result, since that tends to permit more optimisation. Thanks, Richard > + DONE; > + } > +) > + > ;; Unpredicated integer unary arithmetic. > (define_expand "<optab><mode>2" > [(set (match_operand:SVE_I 0 "register_operand") > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/ctz.c > b/gcc/testsuite/gcc.target/aarch64/sve/ctz.c > new file mode 100644 > index 00000000000..433a9174f48 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/ctz.c > @@ -0,0 +1,49 @@ > +/* { dg-final { check-function-bodies "**" "" } } */ > +/* { dg-options "-O3 --param aarch64-autovec-preference=sve-only" } */ > + > +#include <stdint.h> > + > +#define FUNC(FUNCTION, NAME, DTYPE) \ > +void \ > +NAME (DTYPE *__restrict x, DTYPE *__restrict y, int n) { \ > + for (int i = 0; i < n; i++) \ > + x[i] = FUNCTION (y[i]); \ > +} \ > + > + > +/* > +** ctz_uint8: > +** ... > +** rbit z[0-9]+\.b, p[0-7]/m, z[0-9]+\.b > +** clz z[0-9]+\.b, p[0-7]/m, z[0-9]+\.b > +** ... > +*/ > +FUNC (__builtin_ctzg, ctz_uint8, uint8_t) > + > +/* > +** ctz_uint16: > +** ... > +** rbit z[0-9]+\.h, p[0-7]/m, z[0-9]+\.h > +** clz z[0-9]+\.h, p[0-7]/m, z[0-9]+\.h > +** ... > +*/ > +FUNC (__builtin_ctzg, ctz_uint16, uint16_t) > + > +/* > +** ctz_uint32: > +** ... > +** rbit z[0-9]+\.s, p[0-7]/m, z[0-9]+\.s > +** clz z[0-9]+\.s, p[0-7]/m, z[0-9]+\.s > +** ... > +*/ > +FUNC (__builtin_ctz, ctz_uint32, uint32_t) > + > +/* > +** ctz_uint64: > +** ... > +** rbit z[0-9]+\.d, p[0-7]/m, z[0-9]+\.d > +** clz z[0-9]+\.d, p[0-7]/m, z[0-9]+\.d > +** ... > +*/ > +FUNC (__builtin_ctzll, ctz_uint64, uint64_t) > + > -- > 2.43.2