Soumya AR <soum...@nvidia.com> writes: >> On 1 Oct 2024, at 6:17 PM, Richard Sandiford <richard.sandif...@arm.com> >> wrote: >> >> External email: Use caution opening links or attachments >> >> >> Soumya AR <soum...@nvidia.com> writes: >>> Currently, we vectorize CTZ for SVE by using the following operation: >>> .CTZ (X) = (PREC - 1) - .CLZ (X & -X) >>> >>> Instead, this patch expands CTZ to RBIT + CLZ for SVE, as suggested in >>> PR109498. >>> >>> The patch was bootstrapped and regtested on aarch64-linux-gnu, no >>> regression. >>> OK for mainline? >>> >>> Signed-off-by: Soumya AR <soum...@nvidia.com> >>> >>> gcc/ChangeLog: >>> PR target/109498 >>> * config/aarch64/aarch64-sve.md (ctz<mode>2): Added pattern to expand >>> CTZ to RBIT + CLZ for SVE. >>> >>> gcc/testsuite/ChangeLog: >>> PR target/109498 >>> * gcc.target/aarch64/sve/ctz.c: New test. >> >> Generally looks good, but a couple of comments: >> >>> --- >>> gcc/config/aarch64/aarch64-sve.md | 16 +++++++ >>> gcc/testsuite/gcc.target/aarch64/sve/ctz.c | 49 ++++++++++++++++++++++ >>> 2 files changed, 65 insertions(+) >>> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/ctz.c >>> >>> diff --git a/gcc/config/aarch64/aarch64-sve.md >>> b/gcc/config/aarch64/aarch64-sve.md >>> index bfa28849adf..10094f156b3 100644 >>> --- a/gcc/config/aarch64/aarch64-sve.md >>> +++ b/gcc/config/aarch64/aarch64-sve.md >>> @@ -3088,6 +3088,22 @@ >>> ;; - NOT >>> ;; ------------------------------------------------------------------------- >>> >>> +(define_expand "ctz<mode>2" >>> + [(set (match_operand:SVE_I 0 "register_operand") >>> + (unspec:SVE_I >>> + [(match_dup 2) >>> + (ctz:SVE_I >>> + (match_operand:SVE_I 1 "register_operand"))] >>> + UNSPEC_PRED_X))] >>> + "TARGET_SVE" >>> + { >>> + operands[2] = aarch64_ptrue_reg (<VPRED>mode); >> >> There's no real need to use operands[...] here. It can just be >> a local variable. >> >>> + emit_insn (gen_aarch64_pred_rbit<mode> (operands[0], >>> operands[2],operands[1])); >>> + emit_insn (gen_aarch64_pred_clz<mode> (operands[0], operands[2], >>> operands[0])); >> >> Formatting nit: C++ lines should be 80 characters or fewer. >> >> More importantly, I think we should use a fresh register for the >> temporary (RBIT) result, since that tends to permit more optimisation. > > Thanks for the feedback! Attaching an updated patch with the suggested > changes.
Thanks, LGTM. Pushed to trunk. Richard