Soumya AR <soum...@nvidia.com> writes:
>> On 1 Oct 2024, at 6:17 PM, Richard Sandiford <richard.sandif...@arm.com> 
>> wrote:
>>
>> External email: Use caution opening links or attachments
>>
>>
>> Soumya AR <soum...@nvidia.com> writes:
>>> Currently, we vectorize CTZ for SVE by using the following operation:
>>> .CTZ (X) = (PREC - 1) - .CLZ (X & -X)
>>>
>>> Instead, this patch expands CTZ to RBIT + CLZ for SVE, as suggested in 
>>> PR109498.
>>>
>>> The patch was bootstrapped and regtested on aarch64-linux-gnu, no 
>>> regression.
>>> OK for mainline?
>>>
>>> Signed-off-by: Soumya AR <soum...@nvidia.com>
>>>
>>> gcc/ChangeLog:
>>>      PR target/109498
>>>      * config/aarch64/aarch64-sve.md (ctz<mode>2): Added pattern to expand
>>>        CTZ to RBIT + CLZ for SVE.
>>>
>>> gcc/testsuite/ChangeLog:
>>>      PR target/109498
>>>      * gcc.target/aarch64/sve/ctz.c: New test.
>>
>> Generally looks good, but a couple of comments:
>>
>>> ---
>>> gcc/config/aarch64/aarch64-sve.md          | 16 +++++++
>>> gcc/testsuite/gcc.target/aarch64/sve/ctz.c | 49 ++++++++++++++++++++++
>>> 2 files changed, 65 insertions(+)
>>> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/ctz.c
>>>
>>> diff --git a/gcc/config/aarch64/aarch64-sve.md 
>>> b/gcc/config/aarch64/aarch64-sve.md
>>> index bfa28849adf..10094f156b3 100644
>>> --- a/gcc/config/aarch64/aarch64-sve.md
>>> +++ b/gcc/config/aarch64/aarch64-sve.md
>>> @@ -3088,6 +3088,22 @@
>>> ;; - NOT
>>> ;; -------------------------------------------------------------------------
>>>
>>> +(define_expand "ctz<mode>2"
>>> +  [(set (match_operand:SVE_I 0 "register_operand")
>>> +     (unspec:SVE_I
>>> +       [(match_dup 2)
>>> +        (ctz:SVE_I
>>> +          (match_operand:SVE_I 1 "register_operand"))]
>>> +       UNSPEC_PRED_X))]
>>> +  "TARGET_SVE"
>>> +  {
>>> +     operands[2] = aarch64_ptrue_reg (<VPRED>mode);
>>
>> There's no real need to use operands[...] here.  It can just be
>> a local variable.
>>
>>> +     emit_insn (gen_aarch64_pred_rbit<mode> (operands[0], 
>>> operands[2],operands[1]));
>>> +     emit_insn (gen_aarch64_pred_clz<mode> (operands[0], operands[2], 
>>> operands[0]));
>>
>> Formatting nit: C++ lines should be 80 characters or fewer.
>>
>> More importantly, I think we should use a fresh register for the
>> temporary (RBIT) result, since that tends to permit more optimisation.
>
> Thanks for the feedback! Attaching an updated patch with the suggested 
> changes.

Thanks, LGTM.  Pushed to trunk.

Richard

Reply via email to