Re: [PATCH] aarch64: Use SVE ASRD instruction with Neon modes.

Soumya AR Tue, 10 Dec 2024 02:00:11 -0800

Hi Richard,

Thanks for reviewing this!


I’ve made the suggested changes and added the aarch64_ptrue_reg overload.
Thank you for providing the implementation for this, I have added you 
as a co-author for the patch, hope that works :)

Best,
Soumya


> On 5 Dec 2024, at 10:08 PM, Richard Sandiford <richard.sandif...@arm.com> 
> wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> Soumya AR <soum...@nvidia.com> writes:
>> The ASRD instruction on SVE performs an arithmetic shift right by an 
>> immediate
>> for divide.
>> 
>> This patch enables the use of ASRD with Neon modes.
>> 
>> For example:
>> 
>> int in[N], out[N];
>> 
>> void
>> foo (void)
>> {
>>  for (int i = 0; i < N; i++)
>>    out[i] = in[i] / 4;
>> }
>> 
>> compiles to:
>> 
>>      ldr     q31, [x1, x0]
>>      cmlt    v30.16b, v31.16b, #0
>>      and     z30.b, z30.b, 3
>>      add     v30.16b, v30.16b, v31.16b
>>      sshr    v30.16b, v30.16b, 2
>>      str     q30, [x0, x2]
>>      add     x0, x0, 16
>>      cmp     x0, 1024
>> 
>> but can just be:
>> 
>>      ldp     q30, q31, [x0], 32
>>      asrd    z31.b, p7/m, z31.b, #2
>>      asrd    z30.b, p7/m, z30.b, #2
>>      stp     q30, q31, [x1], 32
>>      cmp     x0, x2
>> 
>> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
>> OK for mainline?
>> 
>> Signed-off-by: Soumya AR <soum...@nvidia.com>
>> 
>> gcc/ChangeLog:
>> 
>>      * config/aarch64/aarch64-sve.md: Extended sdiv_pow2<mode>3 and
>>      *sdiv_pow2<mode>3 to support Neon modes.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>>      * gcc.target/aarch64/sve/sve-asrd.c: New test.
>> ---
>> gcc/config/aarch64/aarch64-sve.md             | 25 ++++-----
>> .../gcc.target/aarch64/sve/sve-asrd.c         | 54 +++++++++++++++++++
>> 2 files changed, 67 insertions(+), 12 deletions(-)
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/sve-asrd.c
> 
> Rearranging to put the testcase first:
> 
>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/sve-asrd.c 
>> b/gcc/testsuite/gcc.target/aarch64/sve/sve-asrd.c
>> new file mode 100644
>> index 00000000000..00aa8b2380d
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/sve-asrd.c
>> @@ -0,0 +1,54 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-Ofast --param aarch64-autovec-preference=asimd-only" } */
>> +/* { dg-final { check-function-bodies "**" "" "" } } */
>> +
>> +#include <stdint.h>
>> +#define N 1024
>> +
>> +#define FUNC(M)                     \
>> +M in_##M[N];                        \
>> +M out_##M[N];                       \
>> +void asrd_##M() {                   \
>> +  for (int i = 0; i < N; i++)       \
>> +    out_##M[i] = in_##M[i] / 4;     \
>> +}
>> +
>> +/*
>> +** asrd_int8_t:
>> +**   ...
>> +**   ptrue   (p[0-7]).b, vl1
>> +**   ...
>> +**   asrd    z[0-9]+\.b, \1/m, z[0-9]+\.b, #2
>> +**   ...
>> +*/
>> +FUNC(int8_t)
> 
> ptrue ..., vl1 will only do the shift/division on the first element.
> It should instead be vl8 for 64-bit modes and vl16 for 128-bit modes.
> 
>> +
>> +/*
>> +** asrd_int16_t:
>> +**   ...
>> +**   ptrue   (p[0-7]).b, vl2
>> +**   ...
>> +**   asrd    z[0-9]+\.h, \1/m, z[0-9]+\.h, #2
>> +**   ...
>> +*/
>> +FUNC(int16_t)
>> +
>> +/*
>> +** asrd_int32_t:
>> +**   ...
>> +**   ptrue   (p[0-7]).b, vl4
>> +**   ...
>> +**   asrd    z[0-9]+\.s, \1/m, z[0-9]+\.s, #2
>> +**   ...
>> +*/
>> +FUNC(int32_t)
>> +
>> +/*
>> +** asrd_int64_t:
>> +**   ...
>> +**   ptrue   (p[0-7]).b, vl8
>> +**   ...
>> +**   asrd    z[0-9]+\.d, \1/m, z[0-9]+\.d, #2
>> +**   ...
>> +*/
>> +FUNC(int64_t)
>> diff --git a/gcc/config/aarch64/aarch64-sve.md 
>> b/gcc/config/aarch64/aarch64-sve.md
>> index affdb24a93d..96effe4abed 100644
>> --- a/gcc/config/aarch64/aarch64-sve.md
>> +++ b/gcc/config/aarch64/aarch64-sve.md
>> @@ -4972,34 +4972,35 @@
>> 
>> ;; Unpredicated ASRD.
>> (define_expand "sdiv_pow2<mode>3"
>> -  [(set (match_operand:SVE_I 0 "register_operand")
>> -     (unspec:SVE_I
>> +  [(set (match_operand:SVE_VDQ_I 0 "register_operand")
>> +     (unspec:SVE_VDQ_I
>>        [(match_dup 3)
>> -        (unspec:SVE_I
>> -          [(match_operand:SVE_I 1 "register_operand")
>> +        (unspec:SVE_VDQ_I
>> +          [(match_operand:SVE_VDQ_I 1 "register_operand")
>>            (match_operand 2 "aarch64_simd_rshift_imm")]
>>           UNSPEC_ASRD)]
>>       UNSPEC_PRED_X))]
>>   "TARGET_SVE"
>>   {
>> -    operands[3] = aarch64_ptrue_reg (<VPRED>mode);
>> +    operands[3] = aarch64_ptrue_reg (<VPRED>mode,
>> +                                 GET_MODE_UNIT_SIZE (<MODE>mode));
> 
> Perhaps we should add yet another overload of aarch64_ptrue_reg that
> takes the data mode as a second argument.  The implementation could
> be something like:
> 
> /* Return a register of mode PRED_MODE for controlling data of mode DATA_MODE.
> 
>   DATA_MODE can be a scalar, an Advanced SIMD vector, or an SVE vector.
>   If it's an N-byte scalar or an Advanced SIMD vector, the first N bits
>   of the predicate will be active and the rest will be inactive.
>   If DATA_MODE is an SVE mode, every bit of the predicate will be active.  */
> static rtx
> aarch64_ptrue_reg (machine_mode pred_mode, machine_mode data_mode)
> {
>  if (aarch64_sve_mode_p (data_mode))
>    return aarch64_ptrue_reg (pred_mode);
> 
>  auto size = GET_MODE_SIZE (data_mode).to_constant ();
>  return aarch64_ptrue_reg (pred_mode, size);
> }
> 
> Thanks,
> Richard
> 
>>   }
>> )
>> 
>> ;; Predicated ASRD.
>> (define_insn "*sdiv_pow2<mode>3"
>> -  [(set (match_operand:SVE_I 0 "register_operand")
>> -     (unspec:SVE_I
>> +  [(set (match_operand:SVE_VDQ_I 0 "register_operand")
>> +     (unspec:SVE_VDQ_I
>>        [(match_operand:<VPRED> 1 "register_operand")
>> -        (unspec:SVE_I
>> -          [(match_operand:SVE_I 2 "register_operand")
>> -           (match_operand:SVE_I 3 "aarch64_simd_rshift_imm")]
>> +        (unspec:SVE_VDQ_I
>> +          [(match_operand:SVE_VDQ_I 2 "register_operand")
>> +           (match_operand:SVE_VDQ_I 3 "aarch64_simd_rshift_imm")]
>>           UNSPEC_ASRD)]
>>        UNSPEC_PRED_X))]
>>   "TARGET_SVE"
>>   {@ [ cons: =0 , 1   , 2 ; attrs: movprfx ]
>> -     [ w        , Upl , 0 ; *              ] asrd\t%0.<Vetype>, %1/m, 
>> %0.<Vetype>, #%3
>> -     [ ?&w      , Upl , w ; yes            ] movprfx\t%0, 
>> %2\;asrd\t%0.<Vetype>, %1/m, %0.<Vetype>, #%3
>> +     [ w        , Upl , 0 ; *              ] asrd\t%Z0.<Vetype>, %1/m, 
>> %Z0.<Vetype>, #%3
>> +     [ ?&w      , Upl , w ; yes            ] movprfx\t%Z0, 
>> %Z2\;asrd\t%Z0.<Vetype>, %1/m, %Z0.<Vetype>, #%3
>>   }
>> )

v3-0001-aarch64-Use-SVE-ASRD-instruction-with-Neon-modes.patch
Description: v3-0001-aarch64-Use-SVE-ASRD-instruction-with-Neon-modes.patch

Re: [PATCH] aarch64: Use SVE ASRD instruction with Neon modes.

Reply via email to