> On 30 Aug 2024, at 14:17, Richard Sandiford <richard.sandif...@arm.com> wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> Jennifer Schmitz <jschm...@nvidia.com> writes:
>> This patch implements constant folding for svdiv. If the predicate is
>> ptrue or predication is _x, it uses vector_const_binop with
>> aarch64_const_binop as callback and tree_code TRUNC_DIV_EXPR to fold constant
>> integer operands.
>> In aarch64_const_binop, a case was added for TRUNC_DIV_EXPR to return 0
>> for division by 0, as defined in the semantics for svdiv.
>> Tests were added to check the produced assembly for different
>> predicates, signed and unsigned integers, and the svdiv_n_* case.
>> 
>> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
>> OK for mainline?
>> 
>> Signed-off-by: Jennifer Schmitz <jschm...@nvidia.com>
>> 
>> gcc/
>>      * config/aarch64/aarch64-sve-builtins-base.cc (svdiv_impl::fold):
>>      Try constant folding.
>>      * config/aarch64/aarch64-sve-builtins.cc (aarch64_const_binop):
>>      Add special case for division by 0.
>> 
>> gcc/testsuite/
>>      * gcc.target/aarch64/sve/const_fold_div_1.c: New test.
>> 
>> From 92583732da28f6eb4a8db484fa3b24d55a7265e6 Mon Sep 17 00:00:00 2001
>> From: Jennifer Schmitz <jschm...@nvidia.com>
>> Date: Thu, 29 Aug 2024 05:04:51 -0700
>> Subject: [PATCH 2/3] SVE intrinsics: Fold constant operands for svdiv.
>> 
>> This patch implements constant folding for svdiv. If the predicate is
>> ptrue or predication is _x, it uses vector_const_binop with
>> aarch64_const_binop as callback and tree_code TRUNC_DIV_EXPR to fold constant
>> integer operands.
>> In aarch64_const_binop, a case was added for TRUNC_DIV_EXPR to return 0
>> for division by 0, as defined in the semantics for svdiv.
>> Tests were added to check the produced assembly for different
>> predicates, signed and unsigned integers, and the svdiv_n_* case.
>> 
>> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
>> OK for mainline?
>> 
>> Signed-off-by: Jennifer Schmitz <jschm...@nvidia.com>
>> 
>> gcc/
>>      * config/aarch64/aarch64-sve-builtins-base.cc (svdiv_impl::fold):
>>      Try constant folding.
>>      * config/aarch64/aarch64-sve-builtins.cc (aarch64_const_binop):
>>      Add special case for division by 0.
>> 
>> gcc/testsuite/
>>      * gcc.target/aarch64/sve/const_fold_div_1.c: New test.
>> ---
>> .../aarch64/aarch64-sve-builtins-base.cc      |  19 +-
>> gcc/config/aarch64/aarch64-sve-builtins.cc    |   4 +
>> .../gcc.target/aarch64/sve/const_fold_div_1.c | 336 ++++++++++++++++++
>> 3 files changed, 356 insertions(+), 3 deletions(-)
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/const_fold_div_1.c
>> 
>> diff --git 
>> a/gcc/config/aarch64/aarch64-sve-builtins-base.ccb/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>> index d55bee0b72f..617c7fc87e5 100644
>> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>> @@ -755,8 +755,21 @@ public:
>>   gimple *
>>   fold (gimple_folder &f) const override
>>   {
>> -    tree divisor = gimple_call_arg (f.call, 2);
>> -    tree divisor_cst = uniform_integer_cst_p (divisor);
>> +    tree pg = gimple_call_arg (f.call, 0);
>> +    tree op1 = gimple_call_arg (f.call, 1);
>> +    tree op2 = gimple_call_arg (f.call, 2);
>> +
>> +    /* Try to fold constant integer operands.  */
>> +    if (f.type_suffix (0).integer_p
>> +     && (f.pred == PRED_x
>> +         || is_ptrue (pg, f.type_suffix (0).element_bytes)))
>> +      if (tree res = vector_const_binop (TRUNC_DIV_EXPR, op1, op2,
>> +                                      aarch64_const_binop))
>> +     return gimple_build_assign (f.lhs, res);
> 
> To reduce cut-&-paste, it'd be good to put this in a helper:
> 
>  gimple *gimple_folder::fold_const_binary (tree_code code);
> 
> that does the outermost "if" above for "code" rather than TRUNC_DIV_EXPR.
> It could return null on failure.  Then the caller can just be:
> 
>  if (auto *res = f.fold_const_binary (TRUNC_DIV_EXPR))
>    return res;
> 
> This could go right at the top of the function, since it doesn't rely
> on any of the local variables above.
Done.
> 
>> +
>> +    /* If the divisor is a uniform power of 2, fold to a shift
>> +       instruction.  */
>> +    tree divisor_cst = uniform_integer_cst_p (op2);
>> 
>>     if (!divisor_cst || !integer_pow2p (divisor_cst))
>>       return NULL;
>> @@ -770,7 +783,7 @@ public:
>>                                  shapes::binary_uint_opt_n, MODE_n,
>>                                  f.type_suffix_ids, GROUP_none, f.pred);
>>      call = f.redirect_call (instance);
>> -     tree d = INTEGRAL_TYPE_P (TREE_TYPE (divisor)) ? divisor : divisor_cst;
>> +     tree d = INTEGRAL_TYPE_P (TREE_TYPE (op2)) ? op2 : divisor_cst;
>>      new_divisor = wide_int_to_tree (TREE_TYPE (d), tree_log2 (d));
>>       }
>>     else
>> diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc 
>> b/gcc/config/aarch64/aarch64-sve-builtins.cc
>> index 315d5ac4177..c1b28ebfe4e 100644
>> --- a/gcc/config/aarch64/aarch64-sve-builtins.cc
>> +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
>> @@ -3444,6 +3444,10 @@ aarch64_const_binop (enum tree_code code, tree arg1, 
>> tree arg2)
>>       signop sign = TYPE_SIGN (type);
>>       wi::overflow_type overflow = wi::OVF_NONE;
>> 
>> +      /* Return 0 for division by 0.  */
>> +      if (code == TRUNC_DIV_EXPR && integer_zerop (arg2))
>> +     return arg2;
>> +
>>       if (!poly_int_binop (poly_res, code, arg1, arg2, sign, &overflow))
>>         return NULL_TREE;
>>       return force_fit_type (type, poly_res, false,
>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/const_fold_div_1.c 
>> b/gcc/testsuite/gcc.target/aarch64/sve/const_fold_div_1.c
>> new file mode 100644
>> index 00000000000..062fb6e560e
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/const_fold_div_1.c
>> @@ -0,0 +1,336 @@
>> +/* { dg-final { check-function-bodies "**" "" } } */
>> +/* { dg-options "-O2" } */
>> +
>> +#include "arm_sve.h"
>> +
>> +/*
>> +** s64_x_pg:
>> +**   mov     z[0-9]+\.d, #1
>> +**   ret
>> +*/
>> +svint64_t s64_x_pg (svbool_t pg)
>> +{
>> +  return svdiv_x (pg, svdup_s64 (5), svdup_s64 (3));
>> +}
>> +
>> +/*
>> +** s64_x_pg_0:
>> +**   mov     z[0-9]+\.b, #0
>> +**   ret
>> +*/
>> +svint64_t s64_x_pg_0 (svbool_t pg)
>> +{
>> +  return svdiv_x (pg, svdup_s64 (0), svdup_s64 (3));
>> +}
>> +
>> +/*
>> +** s64_x_pg_by0:
>> +**   mov     z[0-9]+\.b, #0
>> +**   ret
>> +*/
>> +svint64_t s64_x_pg_by0 (svbool_t pg)
>> +{
>> +  return svdiv_x (pg, svdup_s64 (5), svdup_s64 (0));
>> +}
>> +
>> +/*
>> +** s64_z_pg:
>> +**   mov     z[0-9]+\.d, p[0-7]/z, #1
>> +**   ret
>> +*/
>> +svint64_t s64_z_pg (svbool_t pg)
>> +{
>> +  return svdiv_z (pg, svdup_s64 (5), svdup_s64 (3));
>> +}
>> +
>> +/*
>> +** s64_z_pg_0:
>> +**   mov     z[0-9]+\.d, p[0-7]/z, #0
>> +**   ret
>> +*/
>> +svint64_t s64_z_pg_0 (svbool_t pg)
>> +{
>> +  return svdiv_z (pg, svdup_s64 (0), svdup_s64 (3));
>> +}
>> +
>> +/*
>> +** s64_z_pg_by0:
>> +**   mov     (z[0-9]+\.d), #5
>> +**   mov     (z[0-9]+)\.b, #0
>> +**   sdivr   \2\.d, p[0-7]/m, \2\.d, \1
>> +**   ret
>> +*/
>> +svint64_t s64_z_pg_by0 (svbool_t pg)
>> +{
>> +  return svdiv_z (pg, svdup_s64 (5), svdup_s64 (0));
>> +}
>> +
>> +/*
>> +** s64_m_pg:
>> +**   mov     (z[0-9]+\.d), #3
>> +**   mov     (z[0-9]+\.d), #5
>> +**   sdiv    \2, p[0-7]/m, \2, \1
>> +**   ret
>> +*/
>> +svint64_t s64_m_pg (svbool_t pg)
>> +{
>> +  return svdiv_m (pg, svdup_s64 (5), svdup_s64 (3));
>> +}
>> +
>> +/*
>> +** s64_x_ptrue:
>> +**   mov     z[0-9]+\.d, #1
>> +**   ret
>> +*/
>> +svint64_t s64_x_ptrue ()
>> +{
>> +  return svdiv_x (svptrue_b64 (), svdup_s64 (5), svdup_s64 (3));
>> +}
>> +
>> +/*
>> +** s64_z_ptrue:
>> +**   mov     z[0-9]+\.d, #1
>> +**   ret
>> +*/
>> +svint64_t s64_z_ptrue ()
>> +{
>> +  return svdiv_z (svptrue_b64 (), svdup_s64 (5), svdup_s64 (3));
>> +}
>> +
>> +/*
>> +** s64_m_ptrue:
>> +**   mov     z[0-9]+\.d, #1
>> +**   ret
>> +*/
>> +svint64_t s64_m_ptrue ()
>> +{
>> +  return svdiv_m (svptrue_b64 (), svdup_s64 (5), svdup_s64 (3));
>> +}
>> +
>> +/*
>> +** s64_x_pg_n:
>> +**   mov     z[0-9]+\.d, #1
>> +**   ret
>> +*/
>> +svint64_t s64_x_pg_n (svbool_t pg)
>> +{
>> +  return svdiv_n_s64_x (pg, svdup_s64 (5), 3);
>> +}
>> +
>> +/*
>> +** s64_x_pg_n_s64_0:
>> +**   mov     z[0-9]+\.b, #0
>> +**   ret
>> +*/
>> +svint64_t s64_x_pg_n_s64_0 (svbool_t pg)
>> +{
>> +  return svdiv_n_s64_x (pg, svdup_s64 (0), 3);
>> +}
>> +
>> +/*
>> +** s64_x_pg_n_s64_by0:
>> +**   mov     z[0-9]+\.b, #0
>> +**   ret
>> +*/
>> +svint64_t s64_x_pg_n_s64_by0 (svbool_t pg)
>> +{
>> +  return svdiv_n_s64_x (pg, svdup_s64 (5), 0);
>> +}
>> +
>> +/*
>> +** s64_z_pg_n:
>> +**   mov     z[0-9]+\.d, p[0-7]/z, #1
>> +**   ret
>> +*/
>> +svint64_t s64_z_pg_n (svbool_t pg)
>> +{
>> +  return svdiv_n_s64_z (pg, svdup_s64 (5), 3);
>> +}
>> +
>> +/*
>> +** s64_z_pg_n_s64_0:
>> +**   mov     z[0-9]+\.d, p[0-7]/z, #0
>> +**   ret
>> +*/
>> +svint64_t s64_z_pg_n_s64_0 (svbool_t pg)
>> +{
>> +  return svdiv_n_s64_z (pg, svdup_s64 (0), 3);
>> +}
>> +
>> +/*
>> +** s64_z_pg_n_s64_by0:
>> +**   mov     (z[0-9]+\.d), #5
>> +**   mov     (z[0-9]+)\.b, #0
>> +**   sdivr   \2\.d, p[0-7]/m, \2\.d, \1
>> +**   ret
>> +*/
>> +svint64_t s64_z_pg_n_s64_by0 (svbool_t pg)
>> +{
>> +  return svdiv_n_s64_z (pg, svdup_s64 (5), 0);
>> +}
>> +
>> +/*
>> +** s64_m_pg_n:
>> +**   mov     (z[0-9]+\.d), #3
>> +**   mov     (z[0-9]+\.d), #5
>> +**   sdiv    \2, p[0-7]/m, \2, \1
>> +**   ret
>> +*/
>> +svint64_t s64_m_pg_n (svbool_t pg)
>> +{
>> +  return svdiv_n_s64_m (pg, svdup_s64 (5), 3);
>> +}
>> +
>> +/*
>> +** s64_x_ptrue_n:
>> +**   mov     z[0-9]+\.d, #1
>> +**   ret
>> +*/
>> +svint64_t s64_x_ptrue_n ()
>> +{
>> +  return svdiv_n_s64_x (svptrue_b64 (), svdup_s64 (5), 3);
>> +}
>> +
>> +/*
>> +** s64_z_ptrue_n:
>> +**   mov     z[0-9]+\.d, #1
>> +**   ret
>> +*/
>> +svint64_t s64_z_ptrue_n ()
>> +{
>> +  return svdiv_n_s64_z (svptrue_b64 (), svdup_s64 (5), 3);
>> +}
>> +
>> +/*
>> +** s64_m_ptrue_n:
>> +**   mov     z[0-9]+\.d, #1
>> +**   ret
>> +*/
>> +svint64_t s64_m_ptrue_n ()
>> +{
>> +  return svdiv_n_s64_m (svptrue_b64 (), svdup_s64 (5), 3);
>> +}
>> +
>> +/*
>> +** u64_x_pg:
>> +**   mov     z[0-9]+\.d, #1
>> +**   ret
>> +*/
>> +svuint64_t u64_x_pg (svbool_t pg)
>> +{
>> +  return svdiv_x (pg, svdup_u64 (5), svdup_u64 (3));
>> +}
>> +
>> +/*
>> +** u64_z_pg:
>> +**   mov     z[0-9]+\.d, p[0-7]/z, #1
>> +**   ret
>> +*/
>> +svuint64_t u64_z_pg (svbool_t pg)
>> +{
>> +  return svdiv_z (pg, svdup_u64 (5), svdup_u64 (3));
>> +}
>> +
>> +/*
>> +** u64_m_pg:
>> +**   mov     (z[0-9]+\.d), #3
>> +**   mov     (z[0-9]+\.d), #5
>> +**   udiv    \2, p[0-7]/m, \2, \1
>> +**   ret
>> +*/
>> +svuint64_t u64_m_pg (svbool_t pg)
>> +{
>> +  return svdiv_m (pg, svdup_u64 (5), svdup_u64 (3));
>> +}
>> +
>> +/*
>> +** u64_x_ptrue:
>> +**   mov     z[0-9]+\.d, #1
>> +**   ret
>> +*/
>> +svuint64_t u64_x_ptrue ()
>> +{
>> +  return svdiv_x (svptrue_b64 (), svdup_u64 (5), svdup_u64 (3));
>> +}
>> +
>> +/*
>> +** u64_z_ptrue:
>> +**   mov     z[0-9]+\.d, #1
>> +**   ret
>> +*/
>> +svuint64_t u64_z_ptrue ()
>> +{
>> +  return svdiv_z (svptrue_b64 (), svdup_u64 (5), svdup_u64 (3));
>> +}
>> +
>> +/*
>> +** u64_m_ptrue:
>> +**   mov     z[0-9]+\.d, #1
>> +**   ret
>> +*/
>> +svuint64_t u64_m_ptrue ()
>> +{
>> +  return svdiv_m (svptrue_b64 (), svdup_u64 (5), svdup_u64 (3));
>> +}
>> +
>> +/*
>> +** u64_x_pg_n:
>> +**   mov     z[0-9]+\.d, #1
>> +**   ret
>> +*/
>> +svuint64_t u64_x_pg_n (svbool_t pg)
>> +{
>> +  return svdiv_n_u64_x (pg, svdup_u64 (5), 3);
>> +}
>> +
>> +/*
>> +** u64_z_pg_n:
>> +**   mov     z[0-9]+\.d, p[0-7]/z, #1
>> +**   ret
>> +*/
>> +svuint64_t u64_z_pg_n (svbool_t pg)
>> +{
>> +  return svdiv_n_u64_z (pg, svdup_u64 (5), 3);
>> +}
>> +
>> +/*
>> +** u64_m_pg_n:
>> +**   mov     (z[0-9]+\.d), #3
>> +**   mov     (z[0-9]+\.d), #5
>> +**   udiv    \2, p[0-7]/m, \2, \1
>> +**   ret
>> +*/
>> +svuint64_t u64_m_pg_n (svbool_t pg)
>> +{
>> +  return svdiv_n_u64_m (pg, svdup_u64 (5), 3);
>> +}
>> +
>> +/*
>> +** u64_x_ptrue_n:
>> +**   mov     z[0-9]+\.d, #1
>> +**   ret
>> +*/
>> +svuint64_t u64_x_ptrue_n ()
>> +{
>> +  return svdiv_n_u64_x (svptrue_b64 (), svdup_u64 (5), 3);
>> +}
>> +
>> +/*
>> +** u64_z_ptrue_n:
>> +**   mov     z[0-9]+\.d, #1
>> +**   ret
>> +*/
>> +svuint64_t u64_z_ptrue_n ()
>> +{
>> +  return svdiv_n_u64_z (svptrue_b64 (), svdup_u64 (5), 3);
>> +}
>> +
>> +/*
>> +** u64_m_ptrue_n:
>> +**   mov     z[0-9]+\.d, #1
>> +**   ret
>> +*/
>> +svuint64_t u64_m_ptrue_n ()
>> +{
>> +  return svdiv_n_u64_m (svptrue_b64 (), svdup_u64 (5), 3);
>> +}
> 
> These are good tests, but maybe we could throw in a small number
> of svdupq tests as well, to test for non-uniform cases.  E.g.:
> 
>  svdiv_s32_m (svptrue_b32 (), svdupq_s32 (3, 0, -5, 11), svdupq_s32 (4, 1, 
> -6, 0));
> 
> which hopefully should get optimised to zero.
> 
> Similarly:
> 
>  svdiv_s32_z (svptrue_b32 (), svdupq_s32 (6, -30, 100, -4), svdupq_s32 (-3, 
> 15, -50, 2));
> 
> should get optimised to -2.
> 
> Looks good to me otherwise.
> 
> Thanks,
> Richard
Thanks for suggesting these tests, I added them to the test file.
Best, Jennifer

Attachment: 0002-SVE-intrinsics-Fold-constant-operands-for-svdiv.patch
Description: Binary data



Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to