Re: [PATCH 2/2] aarch64: Implement popcountti2 pattern [PR113042]

Richard Sandiford Tue, 20 Aug 2024 11:17:35 -0700

Richard Sandiford <richard.sandif...@arm.com> writes:
> Andrew Pinski <quic_apin...@quicinc.com> writes:
>> When CSSC is not enabled, 128bit popcount can be implemented
>> just via the vector (v16qi) cnt instruction followed by a reduction,
>> like how the 64bit one is currently implemented instead of
>> splitting into 2 64bit popcount.
>>
>> Build and tested for aarch64-linux-gnu.
>>
>>      PR target/113042
>>
>> gcc/ChangeLog:
>>
>>      * config/aarch64/aarch64.md (popcountti2): New define_expand.
>>
>> gcc/testsuite/ChangeLog:
>>
>>      * gcc.target/aarch64/popcnt10.c: New test.
>>      * gcc.target/aarch64/popcnt9.c: New test.
>
> OK if there are no other comments in the next 24 hours.


Sorry, only thought about it later, but:

>> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
>> index 12dcc16529a..73506e71f43 100644
>> --- a/gcc/config/aarch64/aarch64.md
>> +++ b/gcc/config/aarch64/aarch64.md
>> @@ -5378,6 +5378,22 @@ (define_expand "popcount<mode>2"
>>      }
>>  })
>>  
>> +(define_expand "popcountti2"
>> +  [(set (match_operand:TI 0 "register_operand")
>> +    (popcount:TI (match_operand:TI 1 "register_operand")))]

Could you try making the output :DI instead of :TI?  I'd expect
internal-fn.cc to handle that correctly and extend the result to
128 bits where needed.

That would make the dummy popcount rtx malformed, so I suppose
the pattern should just be:

  [(match_operand:DI 0 "register_operand")
   (match_operand:TI 1 "register_operand")]

>> +  "TARGET_SIMD && !TARGET_CSSC"
>> +{
>> +  rtx v = gen_reg_rtx (V16QImode);
>> +  rtx v1 = gen_reg_rtx (V16QImode);
>> +  emit_move_insn (v, gen_lowpart (V16QImode, operands[1]));
>> +  emit_insn (gen_popcountv16qi2 (v1, v));
>> +  rtx out = gen_reg_rtx (DImode);
>> +  emit_insn (gen_aarch64_zero_extenddi_reduc_plus_v16qi (out, v1));

We could then use operands[0] directly as the output here.

Thanks,
Richard

>> +  out = convert_to_mode (TImode, out, true);
>> +  emit_move_insn (operands[0], out);
>> +  DONE;
>> +})

Re: [PATCH 2/2] aarch64: Implement popcountti2 pattern [PR113042]

Reply via email to