> On 1 Oct 2024, at 6:17 PM, Richard Sandiford <richard.sandif...@arm.com> > wrote: > > External email: Use caution opening links or attachments > > > Soumya AR <soum...@nvidia.com> writes: >> Currently, we vectorize CTZ for SVE by using the following operation: >> .CTZ (X) = (PREC - 1) - .CLZ (X & -X) >> >> Instead, this patch expands CTZ to RBIT + CLZ for SVE, as suggested in >> PR109498. >> >> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. >> OK for mainline? >> >> Signed-off-by: Soumya AR <soum...@nvidia.com> >> >> gcc/ChangeLog: >> PR target/109498 >> * config/aarch64/aarch64-sve.md (ctz<mode>2): Added pattern to expand >> CTZ to RBIT + CLZ for SVE. >> >> gcc/testsuite/ChangeLog: >> PR target/109498 >> * gcc.target/aarch64/sve/ctz.c: New test. > > Generally looks good, but a couple of comments: > >> --- >> gcc/config/aarch64/aarch64-sve.md | 16 +++++++ >> gcc/testsuite/gcc.target/aarch64/sve/ctz.c | 49 ++++++++++++++++++++++ >> 2 files changed, 65 insertions(+) >> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/ctz.c >> >> diff --git a/gcc/config/aarch64/aarch64-sve.md >> b/gcc/config/aarch64/aarch64-sve.md >> index bfa28849adf..10094f156b3 100644 >> --- a/gcc/config/aarch64/aarch64-sve.md >> +++ b/gcc/config/aarch64/aarch64-sve.md >> @@ -3088,6 +3088,22 @@ >> ;; - NOT >> ;; ------------------------------------------------------------------------- >> >> +(define_expand "ctz<mode>2" >> + [(set (match_operand:SVE_I 0 "register_operand") >> + (unspec:SVE_I >> + [(match_dup 2) >> + (ctz:SVE_I >> + (match_operand:SVE_I 1 "register_operand"))] >> + UNSPEC_PRED_X))] >> + "TARGET_SVE" >> + { >> + operands[2] = aarch64_ptrue_reg (<VPRED>mode); > > There's no real need to use operands[...] here. It can just be > a local variable. > >> + emit_insn (gen_aarch64_pred_rbit<mode> (operands[0], >> operands[2],operands[1])); >> + emit_insn (gen_aarch64_pred_clz<mode> (operands[0], operands[2], >> operands[0])); > > Formatting nit: C++ lines should be 80 characters or fewer. > > More importantly, I think we should use a fresh register for the > temporary (RBIT) result, since that tends to permit more optimisation.
Thanks for the feedback! Attaching an updated patch with the suggested changes. Regards, Soumya > Thanks, > Richard > > > >> + DONE; >> + } >> +) >> + >> ;; Unpredicated integer unary arithmetic. >> (define_expand "<optab><mode>2" >> [(set (match_operand:SVE_I 0 "register_operand") >> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/ctz.c >> b/gcc/testsuite/gcc.target/aarch64/sve/ctz.c >> new file mode 100644 >> index 00000000000..433a9174f48 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/aarch64/sve/ctz.c >> @@ -0,0 +1,49 @@ >> +/* { dg-final { check-function-bodies "**" "" } } */ >> +/* { dg-options "-O3 --param aarch64-autovec-preference=sve-only" } */ >> + >> +#include <stdint.h> >> + >> +#define FUNC(FUNCTION, NAME, DTYPE) \ >> +void \ >> +NAME (DTYPE *__restrict x, DTYPE *__restrict y, int n) { \ >> + for (int i = 0; i < n; i++) \ >> + x[i] = FUNCTION (y[i]); \ >> +} \ >> + >> + >> +/* >> +** ctz_uint8: >> +** ... >> +** rbit z[0-9]+\.b, p[0-7]/m, z[0-9]+\.b >> +** clz z[0-9]+\.b, p[0-7]/m, z[0-9]+\.b >> +** ... >> +*/ >> +FUNC (__builtin_ctzg, ctz_uint8, uint8_t) >> + >> +/* >> +** ctz_uint16: >> +** ... >> +** rbit z[0-9]+\.h, p[0-7]/m, z[0-9]+\.h >> +** clz z[0-9]+\.h, p[0-7]/m, z[0-9]+\.h >> +** ... >> +*/ >> +FUNC (__builtin_ctzg, ctz_uint16, uint16_t) >> + >> +/* >> +** ctz_uint32: >> +** ... >> +** rbit z[0-9]+\.s, p[0-7]/m, z[0-9]+\.s >> +** clz z[0-9]+\.s, p[0-7]/m, z[0-9]+\.s >> +** ... >> +*/ >> +FUNC (__builtin_ctz, ctz_uint32, uint32_t) >> + >> +/* >> +** ctz_uint64: >> +** ... >> +** rbit z[0-9]+\.d, p[0-7]/m, z[0-9]+\.d >> +** clz z[0-9]+\.d, p[0-7]/m, z[0-9]+\.d >> +** ... >> +*/ >> +FUNC (__builtin_ctzll, ctz_uint64, uint64_t) >> + >> -- >> 2.43.2
0001-aarch64-Expand-CTZ-to-RBIT-CLZ-for-SVE-PR109498.patch
Description: 0001-aarch64-Expand-CTZ-to-RBIT-CLZ-for-SVE-PR109498.patch