Re: [PATCH] SVE intrinsics: Fold svmul and svdiv by -1 to svneg for unsigned types

Richard Sandiford Wed, 13 Nov 2024 04:01:25 -0800

Jennifer Schmitz <jschm...@nvidia.com> writes:
> As follow-up to
> https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665472.html,
> this patch implements folding of svmul and svdiv by -1 to svneg for
> unsigned SVE vector types. The key idea is to reuse the existing code that
> does this fold for signed types and feed it as callback to a helper function
> that adds the necessary type conversions.


I only meant that we should do this for multiplication (since the sign
isn't relevant for N-bit x N-bit -> N-bit multiplication).  It wouldn't
be right for unsigned division, since unsigned division by the maximum
value is instead equivalent to x == MAX ? MAX : 0.

Some comments on the multiplication bit below:

>
> For example, for the test case
> svuint64_t foo (svuint64_t x, svbool_t pg)
> {
>   return svmul_n_u64_x (pg, x, -1);
> }
>
> the following gimple sequence is emitted (-O2 -mcpu=grace):
> svuint64_t foo (svuint64_t x, svbool_t pg)
> {
>   svuint64_t D.12921;
>   svint64_t D.12920;
>   svuint64_t D.12919;
>
>   D.12920 = VIEW_CONVERT_EXPR<svint64_t>(x);
>   D.12921 = svneg_s64_x (pg, D.12920);
>   D.12919 = VIEW_CONVERT_EXPR<svuint64_t>(D.12921);
>   goto <D.12922>;
>   <D.12922>:
>   return D.12919;
> }
>
> In general, the new helper gimple_folder::convert_and_fold
> - takes a target type and a function pointer,
> - converts all non-boolean vector types to the target type,
> - replaces the converted arguments in the function call,
> - calls the callback function,
> - adds the necessary view converts to the gimple sequence,
> - and returns the new call.
>
> Because all arguments are converted to the same target types, the helper
> function is only suitable for folding calls whose arguments are all of
> the same type. If necessary, this could be extended to convert the
> arguments to different types differentially.
>
> The patch was bootstrapped and tested on aarch64-linux-gnu, no regression.
> OK for mainline?
>
> Signed-off-by: Jennifer Schmitz <jschm...@nvidia.com>
>
> gcc/ChangeLog:
>
>       * config/aarch64/aarch64-sve-builtins-base.cc
>       (svmul_impl::fold): Wrap code for folding to svneg in lambda
>       function and pass to gimple_folder::convert_and_fold to enable
>       the transform for unsigned types.
>       (svdiv_impl::fold): Likewise.
>       * config/aarch64/aarch64-sve-builtins.cc
>       (gimple_folder::convert_and_fold): New function that converts
>       operands to target type before calling callback function, adding the
>       necessary conversion statements.
>       * config/aarch64/aarch64-sve-builtins.h
>       (gimple_folder::convert_and_fold): Declare function.
>       (signed_type_suffix_index): Return type_suffix_index of signed
>       vector type for given width.
>       (function_instance::signed_type): Return signed vector type for
>       given width.
>
> gcc/testsuite/ChangeLog:
>
>       * gcc.target/aarch64/sve/acle/asm/div_u32.c: Adjust expected
>       outcome.
>       * gcc.target/aarch64/sve/acle/asm/div_u64.c: Likewise.
>       * gcc.target/aarch64/sve/acle/asm/mul_u8.c: Likewise.
>       * gcc.target/aarch64/sve/acle/asm/mul_u16.c: Likewise.
>       * gcc.target/aarch64/sve/acle/asm/mul_u32.c: Likewise.
>       * gcc.target/aarch64/sve/acle/asm/mul_u64.c: New test and adjust
>       expected outcome.
> ---
>  .../aarch64/aarch64-sve-builtins-base.cc      | 99 ++++++++++++-------
>  gcc/config/aarch64/aarch64-sve-builtins.cc    | 40 ++++++++
>  gcc/config/aarch64/aarch64-sve-builtins.h     | 30 ++++++
>  .../gcc.target/aarch64/sve/acle/asm/div_u32.c |  9 ++
>  .../gcc.target/aarch64/sve/acle/asm/div_u64.c |  9 ++
>  .../gcc.target/aarch64/sve/acle/asm/mul_u16.c |  5 +-
>  .../gcc.target/aarch64/sve/acle/asm/mul_u32.c |  5 +-
>  .../gcc.target/aarch64/sve/acle/asm/mul_u64.c | 26 ++++-
>  .../gcc.target/aarch64/sve/acle/asm/mul_u8.c  |  7 +-
>  9 files changed, 180 insertions(+), 50 deletions(-)
>
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
> b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> index 1c9f515a52c..6df14a8f4c4 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> [...]
> @@ -2082,33 +2091,49 @@ public:
>        return f.fold_active_lanes_to (build_zero_cst (TREE_TYPE (f.lhs)));
>  
>      /* If one of the operands is all integer -1, fold to svneg.  */
> -    tree pg = gimple_call_arg (f.call, 0);
> -    tree negated_op = NULL;
> -    if (integer_minus_onep (op2))
> -      negated_op = op1;
> -    else if (integer_minus_onep (op1))
> -      negated_op = op2;
> -    if (!f.type_suffix (0).unsigned_p && negated_op)
> +     if (integer_minus_onep (op1) || integer_minus_onep (op2))

Formatting nit, sorry, but: indentation looks off.

>        {
> -     function_instance instance ("svneg", functions::svneg,
> -                                 shapes::unary, MODE_none,
> -                                 f.type_suffix_ids, GROUP_none, f.pred);
> -     gcall *call = f.redirect_call (instance);
> -     unsigned offset_index = 0;
> -     if (f.pred == PRED_m)
> +     auto mul_by_m1 = [](gimple_folder &f) -> gcall *
>         {
> -         offset_index = 1;
> -         gimple_call_set_arg (call, 0, op1);
> -       }
> -     else
> -       gimple_set_num_ops (call, 5);
> -     gimple_call_set_arg (call, offset_index, pg);
> -     gimple_call_set_arg (call, offset_index + 1, negated_op);
> -     return call;

Rather than having convert_and_fold modify the call in-place (which
would create an incorrectly typed call if we leave the function itself
unchanged), could we make it pass back a "tree" that contains the
(possibly converted) lhs and a "vec<tree> &" that contains the
(possibly converted) arguments?

> +         tree pg = gimple_call_arg (f.call, 0);
> +         tree op1 = gimple_call_arg (f.call, 1);
> +         tree op2 = gimple_call_arg (f.call, 2);
> +         tree negated_op = op1;
> +         bool negate_op1 = true;
> +         if (integer_minus_onep (op1))
> +           {
> +             negated_op = op2;
> +             negate_op1 = false;
> +           }
> +         type_suffix_pair signed_tsp =
> +           {signed_type_suffix_index (f.type_suffix (0).element_bits),
> +             f.type_suffix_ids[1]};
> +         function_instance instance ("svneg", functions::svneg,
> +                                     shapes::unary, MODE_none,
> +                                     signed_tsp, GROUP_none, f.pred);
> +         gcall *call = f.redirect_call (instance);
> +         unsigned offset = 0;
> +         if (f.pred == PRED_m)
> +           {
> +             offset = 1;
> +             tree ty = f.signed_type (f.type_suffix (0).element_bits);
> +             tree inactive = negate_op1 ? gimple_call_arg (f.call, 1)
> +               : build_minus_one_cst (ty);
> +             gimple_call_set_arg (call, 0, inactive);

Isn't the old approach of using op1 as the inactive argument still
correct?

Otherwise it looks good, thanks.

Richard

> +           }
> +         else
> +           gimple_set_num_ops (call, 5);
> +         gimple_call_set_arg (call, offset, pg);
> +         gimple_call_set_arg (call, offset + 1, negated_op);
> +         return call;
> +       };
> +     tree ty = f.signed_type (f.type_suffix (0).element_bits);
> +     return f.convert_and_fold (ty, mul_by_m1);
>        }
>  
>      /* If one of the operands is a uniform power of 2, fold to a left shift
>         by immediate.  */
> +    tree pg = gimple_call_arg (f.call, 0);
>      tree op1_cst = uniform_integer_cst_p (op1);
>      tree op2_cst = uniform_integer_cst_p (op2);
>      tree shift_op1, shift_op2 = NULL;
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc 
> b/gcc/config/aarch64/aarch64-sve-builtins.cc
> index 44b7f6edae5..6523a782dac 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
> @@ -3576,6 +3576,46 @@ gimple_folder::redirect_pred_x ()
>    return redirect_call (instance);
>  }
>  
> +/* Convert all non-boolean vector-type operands to TYPE, fold the call
> +   using callback F, and convert the result back to the original type.
> +   Add the necessary conversion statements. Return the updated call.  */
> +gimple *
> +gimple_folder::convert_and_fold (tree type, gcall *(*fp) (gimple_folder &))
> +{
> +  gcc_assert (VECTOR_TYPE_P (type)
> +           && TYPE_MODE (type) != VNx16BImode);
> +  tree old_type = TREE_TYPE (lhs);
> +  if (useless_type_conversion_p (type, old_type))
> +    return fp (*this);
> +
> +  unsigned int num_args = gimple_call_num_args (call);
> +  gimple_seq stmts = NULL;
> +  tree op, op_type, t1, t2, t3;
> +  gimple *g;
> +  gcall *new_call;
> +  for (unsigned int i = 0; i < num_args; ++i)
> +    {
> +      op = gimple_call_arg (call, i);
> +      op_type = TREE_TYPE (op);
> +      if (VECTOR_TYPE_P (op_type)
> +       && TREE_CODE (op) != VECTOR_CST
> +       && TYPE_MODE (op_type) != VNx16BImode)
> +     {
> +       t1 = gimple_build (&stmts, VIEW_CONVERT_EXPR, type, op);
> +       gimple_call_set_arg (call, i, t1);
> +     }
> +    }
> +
> +  new_call = fp (*this);
> +  gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
> +  t2 = create_tmp_var (gimple_call_return_type (new_call));
> +  gimple_call_set_lhs (new_call, t2);
> +  t3 = build1 (VIEW_CONVERT_EXPR, old_type, t2);
> +  g = gimple_build_assign (lhs, VIEW_CONVERT_EXPR, t3);
> +  gsi_insert_after (gsi, g, GSI_SAME_STMT);
> +  return new_call;
> +}
> +
>  /* Fold the call to constant VAL.  */
>  gimple *
>  gimple_folder::fold_to_cstu (poly_uint64 val)
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins.h 
> b/gcc/config/aarch64/aarch64-sve-builtins.h
> index d5cc6e0a40d..b48fed0df6b 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins.h
> +++ b/gcc/config/aarch64/aarch64-sve-builtins.h
> @@ -406,6 +406,7 @@ public:
>    tree scalar_type (unsigned int) const;
>    tree vector_type (unsigned int) const;
>    tree tuple_type (unsigned int) const;
> +  tree signed_type (unsigned int) const;
>    unsigned int elements_per_vq (unsigned int) const;
>    machine_mode vector_mode (unsigned int) const;
>    machine_mode tuple_mode (unsigned int) const;
> @@ -630,6 +631,7 @@ public:
>  
>    gcall *redirect_call (const function_instance &);
>    gimple *redirect_pred_x ();
> +  gimple *convert_and_fold (tree, gcall *(*) (gimple_folder &));
>  
>    gimple *fold_to_cstu (poly_uint64);
>    gimple *fold_to_pfalse ();
> @@ -860,6 +862,20 @@ find_type_suffix (type_class_index tclass, unsigned int 
> element_bits)
>    gcc_unreachable ();
>  }
>  
> +/* Return the type suffix of the signed type of width ELEMENT_BITS.  */
> +inline type_suffix_index
> +signed_type_suffix_index (unsigned int element_bits)
> +{
> +  switch (element_bits)
> +  {
> +  case 8: return TYPE_SUFFIX_s8;
> +  case 16: return TYPE_SUFFIX_s16;
> +  case 32: return TYPE_SUFFIX_s32;
> +  case 64: return TYPE_SUFFIX_s64;
> +  }
> +  gcc_unreachable ();
> +}
> +
>  /* Return the single field in tuple type TYPE.  */
>  inline tree
>  tuple_type_field (tree type)
> @@ -1025,6 +1041,20 @@ function_instance::tuple_type (unsigned int i) const
>    return acle_vector_types[num_vectors - 1][type_suffix (i).vector_type];
>  }
>  
> +/* Return the signed vector type of width ELEMENT_BITS.  */
> +inline tree
> +function_instance::signed_type (unsigned int element_bits) const
> +{
> +  switch (element_bits)
> +  {
> +  case 8: return acle_vector_types[0][VECTOR_TYPE_svint8_t];
> +  case 16: return acle_vector_types[0][VECTOR_TYPE_svint16_t];
> +  case 32: return acle_vector_types[0][VECTOR_TYPE_svint32_t];
> +  case 64: return acle_vector_types[0][VECTOR_TYPE_svint64_t];
> +  }
> +  gcc_unreachable ();
> +}
> +
>  /* Return the number of elements of type suffix I that fit within a
>     128-bit block.  */
>  inline unsigned int
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_u32.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_u32.c
> index 1e8d6104845..fcadc05d75b 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_u32.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_u32.c
> @@ -72,6 +72,15 @@ TEST_UNIFORM_Z (div_1_u32_m_untied, svuint32_t,
>               z0 = svdiv_n_u32_m (p0, z1, 1),
>               z0 = svdiv_m (p0, z1, 1))
>  
> +/*
> +** div_m1_u32_m_tied1:
> +**   neg     z0\.s, p0/m, z0\.s
> +**   ret
> +*/
> +TEST_UNIFORM_Z (div_m1_u32_m_tied1, svuint32_t,
> +             z0 = svdiv_n_u32_m (p0, z0, -1),
> +             z0 = svdiv_m (p0, z0, -1))
> +
>  /*
>  ** div_2_u32_m_tied1:
>  **   lsr     z0\.s, p0/m, z0\.s, #1
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_u64.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_u64.c
> index ab049affb63..b95d5af13d8 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_u64.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_u64.c
> @@ -72,6 +72,15 @@ TEST_UNIFORM_Z (div_1_u64_m_untied, svuint64_t,
>               z0 = svdiv_n_u64_m (p0, z1, 1),
>               z0 = svdiv_m (p0, z1, 1))
>  
> +/*
> +** div_m1_u64_m_tied1:
> +**   neg     z0\.d, p0/m, z0\.d
> +**   ret
> +*/
> +TEST_UNIFORM_Z (div_m1_u64_m_tied1, svuint64_t,
> +             z0 = svdiv_n_u64_m (p0, z0, -1),
> +             z0 = svdiv_m (p0, z0, -1))
> +
>  /*
>  ** div_2_u64_m_tied1:
>  **   lsr     z0\.d, p0/m, z0\.d, #1
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u16.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u16.c
> index bdf6fcb98d6..e228dc5995d 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u16.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u16.c
> @@ -174,8 +174,7 @@ TEST_UNIFORM_Z (mul_3_u16_m_untied, svuint16_t,
>  
>  /*
>  ** mul_m1_u16_m:
> -**   mov     (z[0-9]+)\.b, #-1
> -**   mul     z0\.h, p0/m, z0\.h, \1\.h
> +**   neg     z0\.h, p0/m, z0\.h
>  **   ret
>  */
>  TEST_UNIFORM_Z (mul_m1_u16_m, svuint16_t,
> @@ -569,7 +568,7 @@ TEST_UNIFORM_Z (mul_255_u16_x, svuint16_t,
>  
>  /*
>  ** mul_m1_u16_x:
> -**   mul     z0\.h, z0\.h, #-1
> +**   neg     z0\.h, p0/m, z0\.h
>  **   ret
>  */
>  TEST_UNIFORM_Z (mul_m1_u16_x, svuint16_t,
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u32.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u32.c
> index a61e85fa12d..e8f52c9d785 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u32.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u32.c
> @@ -174,8 +174,7 @@ TEST_UNIFORM_Z (mul_3_u32_m_untied, svuint32_t,
>  
>  /*
>  ** mul_m1_u32_m:
> -**   mov     (z[0-9]+)\.b, #-1
> -**   mul     z0\.s, p0/m, z0\.s, \1\.s
> +**   neg     z0\.s, p0/m, z0\.s
>  **   ret
>  */
>  TEST_UNIFORM_Z (mul_m1_u32_m, svuint32_t,
> @@ -569,7 +568,7 @@ TEST_UNIFORM_Z (mul_255_u32_x, svuint32_t,
>  
>  /*
>  ** mul_m1_u32_x:
> -**   mul     z0\.s, z0\.s, #-1
> +**   neg     z0\.s, p0/m, z0\.s
>  **   ret
>  */
>  TEST_UNIFORM_Z (mul_m1_u32_x, svuint32_t,
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u64.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u64.c
> index eee1f8a0c99..2ccdc3642c5 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u64.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u64.c
> @@ -183,14 +183,25 @@ TEST_UNIFORM_Z (mul_3_u64_m_untied, svuint64_t,
>  
>  /*
>  ** mul_m1_u64_m:
> -**   mov     (z[0-9]+)\.b, #-1
> -**   mul     z0\.d, p0/m, z0\.d, \1\.d
> +**   neg     z0\.d, p0/m, z0\.d
>  **   ret
>  */
>  TEST_UNIFORM_Z (mul_m1_u64_m, svuint64_t,
>               z0 = svmul_n_u64_m (p0, z0, -1),
>               z0 = svmul_m (p0, z0, -1))
>  
> +/*
> +** mul_m1r_u64_m:
> +**   mov     (z[0-9]+)\.b, #-1
> +**   mov     (z[0-9]+\.d), z0\.d
> +**   movprfx z0, \1
> +**   neg     z0\.d, p0/m, \2
> +**   ret
> +*/
> +TEST_UNIFORM_Z (mul_m1r_u64_m, svuint64_t,
> +             z0 = svmul_u64_m (p0, svdup_u64 (-1), z0),
> +             z0 = svmul_m (p0, svdup_u64 (-1), z0))
> +
>  /*
>  ** mul_u64_z_tied1:
>  **   movprfx z0\.d, p0/z, z0\.d
> @@ -597,13 +608,22 @@ TEST_UNIFORM_Z (mul_255_u64_x, svuint64_t,
>  
>  /*
>  ** mul_m1_u64_x:
> -**   mul     z0\.d, z0\.d, #-1
> +**   neg     z0\.d, p0/m, z0\.d
>  **   ret
>  */
>  TEST_UNIFORM_Z (mul_m1_u64_x, svuint64_t,
>               z0 = svmul_n_u64_x (p0, z0, -1),
>               z0 = svmul_x (p0, z0, -1))
>  
> +/*
> +** mul_m1r_u64_x:
> +**   neg     z0\.d, p0/m, z0\.d
> +**   ret
> +*/
> +TEST_UNIFORM_Z (mul_m1r_u64_x, svuint64_t,
> +             z0 = svmul_u64_x (p0, svdup_u64 (-1), z0),
> +             z0 = svmul_x (p0, svdup_u64 (-1), z0))
> +
>  /*
>  ** mul_m127_u64_x:
>  **   mul     z0\.d, z0\.d, #-127
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c
> index 06ee1b3e7c8..8e53a4821f0 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c
> @@ -174,8 +174,7 @@ TEST_UNIFORM_Z (mul_3_u8_m_untied, svuint8_t,
>  
>  /*
>  ** mul_m1_u8_m:
> -**   mov     (z[0-9]+)\.b, #-1
> -**   mul     z0\.b, p0/m, z0\.b, \1\.b
> +**   neg     z0\.b, p0/m, z0\.b
>  **   ret
>  */
>  TEST_UNIFORM_Z (mul_m1_u8_m, svuint8_t,
> @@ -559,7 +558,7 @@ TEST_UNIFORM_Z (mul_128_u8_x, svuint8_t,
>  
>  /*
>  ** mul_255_u8_x:
> -**   mul     z0\.b, z0\.b, #-1
> +**   neg     z0\.b, p0/m, z0\.b
>  **   ret
>  */
>  TEST_UNIFORM_Z (mul_255_u8_x, svuint8_t,
> @@ -568,7 +567,7 @@ TEST_UNIFORM_Z (mul_255_u8_x, svuint8_t,
>  
>  /*
>  ** mul_m1_u8_x:
> -**   mul     z0\.b, z0\.b, #-1
> +**   neg     z0\.b, p0/m, z0\.b
>  **   ret
>  */
>  TEST_UNIFORM_Z (mul_m1_u8_x, svuint8_t,

Re: [PATCH] SVE intrinsics: Fold svmul and svdiv by -1 to svneg for unsigned types

Reply via email to