Re: [PATCH] SVE intrinsics: Fold svmul and svdiv by -1 to svneg for unsigned types

Jennifer Schmitz Mon, 09 Dec 2024 23:37:04 -0800
Ping.
Thanks,
Jennifer

> On 3 Dec 2024, at 09:14, Jennifer Schmitz <jschm...@nvidia.com> wrote:
> 
> Ping.
> Thanks,
> Jennifer
> 
>> On 26 Nov 2024, at 09:18, Jennifer Schmitz <jschm...@nvidia.com> wrote:
>> 
>> 
>> 
>>> On 20 Nov 2024, at 13:43, Richard Sandiford <richard.sandif...@arm.com> 
>>> wrote:
>>> 
>>> External email: Use caution opening links or attachments
>>> 
>>> 
>>> Jennifer Schmitz <jschm...@nvidia.com> writes:
>>>>> On 13 Nov 2024, at 12:54, Richard Sandiford <richard.sandif...@arm.com> 
>>>>> wrote:
>>>>> 
>>>>> External email: Use caution opening links or attachments
>>>>> 
>>>>> 
>>>>> Jennifer Schmitz <jschm...@nvidia.com> writes:
>>>>>> As follow-up to
>>>>>> https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665472.html,
>>>>>> this patch implements folding of svmul and svdiv by -1 to svneg for
>>>>>> unsigned SVE vector types. The key idea is to reuse the existing code 
>>>>>> that
>>>>>> does this fold for signed types and feed it as callback to a helper 
>>>>>> function
>>>>>> that adds the necessary type conversions.
>>>>> 
>>>>> I only meant that we should do this for multiplication (since the sign
>>>>> isn't relevant for N-bit x N-bit -> N-bit multiplication).  It wouldn't
>>>>> be right for unsigned division, since unsigned division by the maximum
>>>>> value is instead equivalent to x == MAX ? MAX : 0.
>>>>> 
>>>>> Some comments on the multiplication bit below:
>>>>> 
>>>>>> 
>>>>>> For example, for the test case
>>>>>> svuint64_t foo (svuint64_t x, svbool_t pg)
>>>>>> {
>>>>>> return svmul_n_u64_x (pg, x, -1);
>>>>>> }
>>>>>> 
>>>>>> the following gimple sequence is emitted (-O2 -mcpu=grace):
>>>>>> svuint64_t foo (svuint64_t x, svbool_t pg)
>>>>>> {
>>>>>> svuint64_t D.12921;
>>>>>> svint64_t D.12920;
>>>>>> svuint64_t D.12919;
>>>>>> 
>>>>>> D.12920 = VIEW_CONVERT_EXPR<svint64_t>(x);
>>>>>> D.12921 = svneg_s64_x (pg, D.12920);
>>>>>> D.12919 = VIEW_CONVERT_EXPR<svuint64_t>(D.12921);
>>>>>> goto <D.12922>;
>>>>>> <D.12922>:
>>>>>> return D.12919;
>>>>>> }
>>>>>> 
>>>>>> In general, the new helper gimple_folder::convert_and_fold
>>>>>> - takes a target type and a function pointer,
>>>>>> - converts all non-boolean vector types to the target type,
>>>>>> - replaces the converted arguments in the function call,
>>>>>> - calls the callback function,
>>>>>> - adds the necessary view converts to the gimple sequence,
>>>>>> - and returns the new call.
>>>>>> 
>>>>>> Because all arguments are converted to the same target types, the helper
>>>>>> function is only suitable for folding calls whose arguments are all of
>>>>>> the same type. If necessary, this could be extended to convert the
>>>>>> arguments to different types differentially.
>>>>>> 
>>>>>> The patch was bootstrapped and tested on aarch64-linux-gnu, no 
>>>>>> regression.
>>>>>> OK for mainline?
>>>>>> 
>>>>>> Signed-off-by: Jennifer Schmitz <jschm...@nvidia.com>
>>>>>> 
>>>>>> gcc/ChangeLog:
>>>>>> 
>>>>>>   * config/aarch64/aarch64-sve-builtins-base.cc
>>>>>>   (svmul_impl::fold): Wrap code for folding to svneg in lambda
>>>>>>   function and pass to gimple_folder::convert_and_fold to enable
>>>>>>   the transform for unsigned types.
>>>>>>   (svdiv_impl::fold): Likewise.
>>>>>>   * config/aarch64/aarch64-sve-builtins.cc
>>>>>>   (gimple_folder::convert_and_fold): New function that converts
>>>>>>   operands to target type before calling callback function, adding the
>>>>>>   necessary conversion statements.
>>>>>>   * config/aarch64/aarch64-sve-builtins.h
>>>>>>   (gimple_folder::convert_and_fold): Declare function.
>>>>>>   (signed_type_suffix_index): Return type_suffix_index of signed
>>>>>>   vector type for given width.
>>>>>>   (function_instance::signed_type): Return signed vector type for
>>>>>>   given width.
>>>>>> 
>>>>>> gcc/testsuite/ChangeLog:
>>>>>> 
>>>>>>   * gcc.target/aarch64/sve/acle/asm/div_u32.c: Adjust expected
>>>>>>   outcome.
>>>>>>   * gcc.target/aarch64/sve/acle/asm/div_u64.c: Likewise.
>>>>>>   * gcc.target/aarch64/sve/acle/asm/mul_u8.c: Likewise.
>>>>>>   * gcc.target/aarch64/sve/acle/asm/mul_u16.c: Likewise.
>>>>>>   * gcc.target/aarch64/sve/acle/asm/mul_u32.c: Likewise.
>>>>>>   * gcc.target/aarch64/sve/acle/asm/mul_u64.c: New test and adjust
>>>>>>   expected outcome.
>>>>>> ---
>>>>>> .../aarch64/aarch64-sve-builtins-base.cc      | 99 ++++++++++++-------
>>>>>> gcc/config/aarch64/aarch64-sve-builtins.cc    | 40 ++++++++
>>>>>> gcc/config/aarch64/aarch64-sve-builtins.h     | 30 ++++++
>>>>>> .../gcc.target/aarch64/sve/acle/asm/div_u32.c |  9 ++
>>>>>> .../gcc.target/aarch64/sve/acle/asm/div_u64.c |  9 ++
>>>>>> .../gcc.target/aarch64/sve/acle/asm/mul_u16.c |  5 +-
>>>>>> .../gcc.target/aarch64/sve/acle/asm/mul_u32.c |  5 +-
>>>>>> .../gcc.target/aarch64/sve/acle/asm/mul_u64.c | 26 ++++-
>>>>>> .../gcc.target/aarch64/sve/acle/asm/mul_u8.c  |  7 +-
>>>>>> 9 files changed, 180 insertions(+), 50 deletions(-)
>>>>>> 
>>>>>> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
>>>>>> b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>>>>>> index 1c9f515a52c..6df14a8f4c4 100644
>>>>>> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>>>>>> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>>>>>> [...]
>>>>>> @@ -2082,33 +2091,49 @@ public:
>>>>>>    return f.fold_active_lanes_to (build_zero_cst (TREE_TYPE (f.lhs)));
>>>>>> 
>>>>>>  /* If one of the operands is all integer -1, fold to svneg.  */
>>>>>> -    tree pg = gimple_call_arg (f.call, 0);
>>>>>> -    tree negated_op = NULL;
>>>>>> -    if (integer_minus_onep (op2))
>>>>>> -      negated_op = op1;
>>>>>> -    else if (integer_minus_onep (op1))
>>>>>> -      negated_op = op2;
>>>>>> -    if (!f.type_suffix (0).unsigned_p && negated_op)
>>>>>> +     if (integer_minus_onep (op1) || integer_minus_onep (op2))
>>>>> 
>>>>> Formatting nit, sorry, but: indentation looks off.
>>>>> 
>>>>>>    {
>>>>>> -     function_instance instance ("svneg", functions::svneg,
>>>>>> -                                 shapes::unary, MODE_none,
>>>>>> -                                 f.type_suffix_ids, GROUP_none, f.pred);
>>>>>> -     gcall *call = f.redirect_call (instance);
>>>>>> -     unsigned offset_index = 0;
>>>>>> -     if (f.pred == PRED_m)
>>>>>> +     auto mul_by_m1 = [](gimple_folder &f) -> gcall *
>>>>>>     {
>>>>>> -         offset_index = 1;
>>>>>> -         gimple_call_set_arg (call, 0, op1);
>>>>>> -       }
>>>>>> -     else
>>>>>> -       gimple_set_num_ops (call, 5);
>>>>>> -     gimple_call_set_arg (call, offset_index, pg);
>>>>>> -     gimple_call_set_arg (call, offset_index + 1, negated_op);
>>>>>> -     return call;
>>>>> 
>>>>> Rather than having convert_and_fold modify the call in-place (which
>>>>> would create an incorrectly typed call if we leave the function itself
>>>>> unchanged), could we make it pass back a "tree" that contains the
>>>>> (possibly converted) lhs and a "vec<tree> &" that contains the
>>>>> (possibly converted) arguments?
>>>> Dear Richard,
>>>> I agree that it's preferable to not have convert_and_fold modify the call 
>>>> in-place and wanted to ask for clarification about which sort of design 
>>>> you had in mind:
>>>> Option 1:
>>>> tree gimple_folder::convert_and_fold (tree type, vec<tree> &args_conv), 
>>>> such that the function takes a target type TYPE to which all non-boolean 
>>>> vector operands are converted and a reference for a vec<tree> that it 
>>>> fills with converted argument trees. It returns a tree with the (possibly 
>>>> converted) lhs. It also adds the convert_view statements, but makes no 
>>>> changes to the call itself.
>>>> The caller of the function uses the converted arguments and lhs to 
>>>> assemble the new gcall.
>>>> 
>>>> Option 2:
>>>> gcall *gimple_folder::convert_and_fold (tree type, gcall *(*fp) (tree, 
>>>> vec<tree>, gimple_folder &), where the function converts the lhs and 
>>>> arguments to TYPE and assigns them to the newly created tree lhs_conv and 
>>>> vec<tree> args_conv that are passed to the function pointer FP. The 
>>>> callback assembles the new call and returns it to convert_and_fold, which 
>>>> adds the necessary conversion statements before returning the new call to 
>>>> the caller. So, in this case it would be the callback modifying the call.
>>> 
>>> Yeah, I meant option 2, but with vec<tree> & rather than plain vec<tree>.
>>> I suppose the callback should return a gimple * rather than a gcall *
>>> though, in case the callback wants to create a gassign instead.
>>> 
>>> (Thanks for checking btw)
>> Thanks for clarifying. Below you find the updated patch. I made the 
>> following changes:
>> - removed folding of svdiv and reverted the test cases
>> - pass lhs_conv and args_conv to callback instead of having convert_and_fold 
>> change the call inplace
>> - re-validated, no regression.
>> Thanks,
>> Jennifer
>> 
>> 
>> As follow-up to
>> https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665472.html,
>> this patch implements folding of svmul by -1 to svneg for
>> unsigned SVE vector types. The key idea is to reuse the existing code that
>> does this fold for signed types and feed it as callback to a helper function
>> that adds the necessary type conversions.
>> 
>> For example, for the test case
>> svuint64_t foo (svuint64_t x, svbool_t pg)
>> {
>> return svmul_n_u64_x (pg, x, -1);
>> }
>> 
>> the following gimple sequence is emitted (-O2 -mcpu=grace):
>> svuint64_t foo (svuint64_t x, svbool_t pg)
>> {
>> svint64_t D.12921;
>> svint64_t D.12920;
>> svuint64_t D.12919;
>> 
>> D.12920 = VIEW_CONVERT_EXPR<svint64_t>(x);
>> D.12921 = svneg_s64_x (pg, D.12920);
>> D.12919 = VIEW_CONVERT_EXPR<svuint64_t>(D.12921);
>> goto <D.12922>;
>> <D.12922>:
>> return D.12919;
>> }
>> 
>> In general, the new helper gimple_folder::convert_and_fold
>> - takes a target type and a function pointer,
>> - converts the lhs and all non-boolean vector types to the target type,
>> - passes the converted lhs and arguments to the callback,
>> - receives the new gimple statement from the callback function,
>> - adds the necessary view converts to the gimple sequence,
>> - and returns the new call.
>> 
>> Because all arguments are converted to the same target types, the helper
>> function is only suitable for folding calls whose arguments are all of
>> the same type. If necessary, this could be extended to convert the
>> arguments to different types differentially.
>> 
>> The patch was bootstrapped and tested on aarch64-linux-gnu, no regression.
>> OK for mainline?
>> 
>> Signed-off-by: Jennifer Schmitz <jschm...@nvidia.com>
>> 
>> gcc/ChangeLog:
>> 
>> * config/aarch64/aarch64-sve-builtins-base.cc
>> (svmul_impl::fold): Wrap code for folding to svneg in lambda
>> function and pass to gimple_folder::convert_and_fold to enable
>> the transform for unsigned types.
>> * config/aarch64/aarch64-sve-builtins.cc
>> (gimple_folder::convert_and_fold): New function that converts
>> operands to target type before calling callback function, adding the
>> necessary conversion statements.
>> * config/aarch64/aarch64-sve-builtins.h
>> (gimple_folder::convert_and_fold): Declare function.
>> (signed_type_suffix_index): Return type_suffix_index of signed
>> vector type for given width.
>> (function_instance::signed_type): Return signed vector type for
>> given width.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>> * gcc.target/aarch64/sve/acle/asm/mul_u8.c: Adjust expected outcome.
>> * gcc.target/aarch64/sve/acle/asm/mul_u16.c: Likewise.
>> * gcc.target/aarch64/sve/acle/asm/mul_u32.c: Likewise.
>> * gcc.target/aarch64/sve/acle/asm/mul_u64.c: New test and adjust
>> expected outcome.
>> ---
>> .../aarch64/aarch64-sve-builtins-base.cc      | 70 +++++++++++++------
>> gcc/config/aarch64/aarch64-sve-builtins.cc    | 43 ++++++++++++
>> gcc/config/aarch64/aarch64-sve-builtins.h     | 31 ++++++++
>> .../gcc.target/aarch64/sve/acle/asm/mul_u16.c |  5 +-
>> .../gcc.target/aarch64/sve/acle/asm/mul_u32.c |  5 +-
>> .../gcc.target/aarch64/sve/acle/asm/mul_u64.c | 26 ++++++-
>> .../gcc.target/aarch64/sve/acle/asm/mul_u8.c  |  7 +-
>> 7 files changed, 153 insertions(+), 34 deletions(-)
>> 
>> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
>> b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>> index 87e9909b55a..52401a8c57a 100644
>> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>> @@ -2092,33 +2092,61 @@ public:
>>      return f.fold_active_lanes_to (build_zero_cst (TREE_TYPE (f.lhs)));
>> 
>>    /* If one of the operands is all integer -1, fold to svneg.  */
>> -    tree pg = gimple_call_arg (f.call, 0);
>> -    tree negated_op = NULL;
>> -    if (integer_minus_onep (op2))
>> -      negated_op = op1;
>> -    else if (integer_minus_onep (op1))
>> -      negated_op = op2;
>> -    if (!f.type_suffix (0).unsigned_p && negated_op)
>> +    if (integer_minus_onep (op1) || integer_minus_onep (op2))
>>      {
>> - function_instance instance ("svneg", functions::svneg,
>> -     shapes::unary, MODE_none,
>> -     f.type_suffix_ids, GROUP_none, f.pred);
>> - gcall *call = f.redirect_call (instance);
>> - unsigned offset_index = 0;
>> - if (f.pred == PRED_m)
>> + auto mul_by_m1 = [](gimple_folder &f, tree lhs_conv,
>> +     vec<tree> &args_conv) -> gimple *
>>  {
>> -     offset_index = 1;
>> -     gimple_call_set_arg (call, 0, op1);
>> -   }
>> - else
>> -   gimple_set_num_ops (call, 5);
>> - gimple_call_set_arg (call, offset_index, pg);
>> - gimple_call_set_arg (call, offset_index + 1, negated_op);
>> - return call;
>> +     gcc_assert (lhs_conv && args_conv.length () == 3);
>> +     tree pg = args_conv[0];
>> +     tree op1 = args_conv[1];
>> +     tree op2 = args_conv[2];
>> +     tree negated_op = op1;
>> +     bool negate_op1 = true;
>> +     if (integer_minus_onep (op1))
>> +       {
>> + negated_op = op2;
>> + negate_op1 = false;
>> +       }
>> +     type_suffix_pair signed_tsp =
>> +       {signed_type_suffix_index (f.type_suffix (0).element_bits),
>> + f.type_suffix_ids[1]};
>> +     function_instance instance ("svneg", functions::svneg,
>> + shapes::unary, MODE_none,
>> + signed_tsp, GROUP_none, f.pred);
>> +     gcall *call = f.redirect_call (instance);
>> +     gimple_call_set_lhs (call, lhs_conv);
>> +     unsigned offset = 0;
>> +     tree fntype, op1_type = TREE_TYPE (op1);
>> +     if (f.pred == PRED_m)
>> +       {
>> + offset = 1;
>> + tree arg_types[3] = {op1_type, TREE_TYPE (pg), op1_type};
>> + fntype = build_function_type_array (TREE_TYPE (lhs_conv),
>> +     3, arg_types);
>> + tree ty = f.signed_type (f.type_suffix (0).element_bits);
>> + tree inactive = negate_op1 ? op1 : build_minus_one_cst (ty);
>> + gimple_call_set_arg (call, 0, inactive);
>> +       }
>> +     else
>> +       {
>> + gimple_set_num_ops (call, 5);
>> + tree arg_types[2] = {TREE_TYPE (pg), op1_type};
>> + fntype = build_function_type_array (TREE_TYPE (lhs_conv),
>> +     2, arg_types);
>> +       }
>> +     gimple_call_set_fntype (call, fntype);
>> +     gimple_call_set_arg (call, offset, pg);
>> +     gimple_call_set_arg (call, offset + 1, negated_op);
>> +     return call;
>> +   };
>> + tree ty = f.signed_type (f.type_suffix (0).element_bits);
>> + return f.convert_and_fold (ty, mul_by_m1);
>>      }
>> 
>>    /* If one of the operands is a uniform power of 2, fold to a left shift
>>       by immediate.  */
>> +    tree pg = gimple_call_arg (f.call, 0);
>>    tree op1_cst = uniform_integer_cst_p (op1);
>>    tree op2_cst = uniform_integer_cst_p (op2);
>>    tree shift_op1, shift_op2 = NULL;
>> diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc 
>> b/gcc/config/aarch64/aarch64-sve-builtins.cc
>> index 0fec1cd439e..01b0da22c9b 100644
>> --- a/gcc/config/aarch64/aarch64-sve-builtins.cc
>> +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
>> @@ -3646,6 +3646,49 @@ gimple_folder::redirect_pred_x ()
>>  return redirect_call (instance);
>> }
>> 
>> +/* Convert the lhs and all non-boolean vector-type operands to TYPE.
>> +   Pass the converted variables to the callback FP, and finally convert the
>> +   result back to the original type. Add the necessary conversion 
>> statements.
>> +   Return the new call.  */
>> +gimple *
>> +gimple_folder::convert_and_fold (tree type,
>> +  gimple *(*fp) (gimple_folder &,
>> + tree, vec<tree> &))
>> +{
>> +  gcc_assert (VECTOR_TYPE_P (type)
>> +       && TYPE_MODE (type) != VNx16BImode);
>> +  tree old_ty = TREE_TYPE (lhs);
>> +  gimple_seq stmts = NULL;
>> +  tree lhs_conv, op, op_ty, t;
>> +  gimple *g, *new_stmt;
>> +  bool convert_lhs_p = !useless_type_conversion_p (type, old_ty);
>> +  lhs_conv = convert_lhs_p ? create_tmp_var (type) : lhs;
>> +  unsigned int num_args = gimple_call_num_args (call);
>> +  vec<tree> args_conv = vNULL;
>> +  args_conv.safe_grow (num_args);
>> +  for (unsigned int i = 0; i < num_args; ++i)
>> +    {
>> +      op = gimple_call_arg (call, i);
>> +      op_ty = TREE_TYPE (op);
>> +      args_conv[i] =
>> + (VECTOR_TYPE_P (op_ty)
>> +  && TREE_CODE (op) != VECTOR_CST
>> +  && TYPE_MODE (op_ty) != VNx16BImode
>> +  && !useless_type_conversion_p (op_ty, type))
>> + ? gimple_build (&stmts, VIEW_CONVERT_EXPR, type, op) : op;
>> +    }
>> +
>> +  new_stmt = fp (*this, lhs_conv, args_conv);
>> +  gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
>> +  if (convert_lhs_p)
>> +    {
>> +      t = build1 (VIEW_CONVERT_EXPR, old_ty, lhs_conv);
>> +      g = gimple_build_assign (lhs, VIEW_CONVERT_EXPR, t);
>> +      gsi_insert_after (gsi, g, GSI_SAME_STMT);
>> +    }
>> +  return new_stmt;
>> +}
>> +
>> /* Fold the call to constant VAL.  */
>> gimple *
>> gimple_folder::fold_to_cstu (poly_uint64 val)
>> diff --git a/gcc/config/aarch64/aarch64-sve-builtins.h 
>> b/gcc/config/aarch64/aarch64-sve-builtins.h
>> index 4094f8207f9..3e919e52e2c 100644
>> --- a/gcc/config/aarch64/aarch64-sve-builtins.h
>> +++ b/gcc/config/aarch64/aarch64-sve-builtins.h
>> @@ -406,6 +406,7 @@ public:
>>  tree scalar_type (unsigned int) const;
>>  tree vector_type (unsigned int) const;
>>  tree tuple_type (unsigned int) const;
>> +  tree signed_type (unsigned int) const;
>>  unsigned int elements_per_vq (unsigned int) const;
>>  machine_mode vector_mode (unsigned int) const;
>>  machine_mode tuple_mode (unsigned int) const;
>> @@ -632,6 +633,8 @@ public:
>> 
>>  gcall *redirect_call (const function_instance &);
>>  gimple *redirect_pred_x ();
>> +  gimple *convert_and_fold (tree, gimple *(*) (gimple_folder &,
>> +        tree, vec<tree> &));
>> 
>>  gimple *fold_to_cstu (poly_uint64);
>>  gimple *fold_to_pfalse ();
>> @@ -864,6 +867,20 @@ find_type_suffix (type_class_index tclass, unsigned int 
>> element_bits)
>>  gcc_unreachable ();
>> }
>> 
>> +/* Return the type suffix of the signed type of width ELEMENT_BITS.  */
>> +inline type_suffix_index
>> +signed_type_suffix_index (unsigned int element_bits)
>> +{
>> +  switch (element_bits)
>> +  {
>> +  case 8: return TYPE_SUFFIX_s8;
>> +  case 16: return TYPE_SUFFIX_s16;
>> +  case 32: return TYPE_SUFFIX_s32;
>> +  case 64: return TYPE_SUFFIX_s64;
>> +  }
>> +  gcc_unreachable ();
>> +}
>> +
>> /* Return the single field in tuple type TYPE.  */
>> inline tree
>> tuple_type_field (tree type)
>> @@ -1029,6 +1046,20 @@ function_instance::tuple_type (unsigned int i) const
>>  return acle_vector_types[num_vectors - 1][type_suffix (i).vector_type];
>> }
>> 
>> +/* Return the signed vector type of width ELEMENT_BITS.  */
>> +inline tree
>> +function_instance::signed_type (unsigned int element_bits) const
>> +{
>> +  switch (element_bits)
>> +  {
>> +  case 8: return acle_vector_types[0][VECTOR_TYPE_svint8_t];
>> +  case 16: return acle_vector_types[0][VECTOR_TYPE_svint16_t];
>> +  case 32: return acle_vector_types[0][VECTOR_TYPE_svint32_t];
>> +  case 64: return acle_vector_types[0][VECTOR_TYPE_svint64_t];
>> +  }
>> +  gcc_unreachable ();
>> +}
>> +
>> /* Return the number of elements of type suffix I that fit within a
>>   128-bit block.  */
>> inline unsigned int
>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u16.c 
>> b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u16.c
>> index bdf6fcb98d6..e228dc5995d 100644
>> --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u16.c
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u16.c
>> @@ -174,8 +174,7 @@ TEST_UNIFORM_Z (mul_3_u16_m_untied, svuint16_t,
>> 
>> /*
>> ** mul_m1_u16_m:
>> -** mov (z[0-9]+)\.b, #-1
>> -** mul z0\.h, p0/m, z0\.h, \1\.h
>> +** neg z0\.h, p0/m, z0\.h
>> ** ret
>> */
>> TEST_UNIFORM_Z (mul_m1_u16_m, svuint16_t,
>> @@ -569,7 +568,7 @@ TEST_UNIFORM_Z (mul_255_u16_x, svuint16_t,
>> 
>> /*
>> ** mul_m1_u16_x:
>> -** mul z0\.h, z0\.h, #-1
>> +** neg z0\.h, p0/m, z0\.h
>> ** ret
>> */
>> TEST_UNIFORM_Z (mul_m1_u16_x, svuint16_t,
>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u32.c 
>> b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u32.c
>> index a61e85fa12d..e8f52c9d785 100644
>> --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u32.c
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u32.c
>> @@ -174,8 +174,7 @@ TEST_UNIFORM_Z (mul_3_u32_m_untied, svuint32_t,
>> 
>> /*
>> ** mul_m1_u32_m:
>> -** mov (z[0-9]+)\.b, #-1
>> -** mul z0\.s, p0/m, z0\.s, \1\.s
>> +** neg z0\.s, p0/m, z0\.s
>> ** ret
>> */
>> TEST_UNIFORM_Z (mul_m1_u32_m, svuint32_t,
>> @@ -569,7 +568,7 @@ TEST_UNIFORM_Z (mul_255_u32_x, svuint32_t,
>> 
>> /*
>> ** mul_m1_u32_x:
>> -** mul z0\.s, z0\.s, #-1
>> +** neg z0\.s, p0/m, z0\.s
>> ** ret
>> */
>> TEST_UNIFORM_Z (mul_m1_u32_x, svuint32_t,
>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u64.c 
>> b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u64.c
>> index eee1f8a0c99..2ccdc3642c5 100644
>> --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u64.c
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u64.c
>> @@ -183,14 +183,25 @@ TEST_UNIFORM_Z (mul_3_u64_m_untied, svuint64_t,
>> 
>> /*
>> ** mul_m1_u64_m:
>> -** mov (z[0-9]+)\.b, #-1
>> -** mul z0\.d, p0/m, z0\.d, \1\.d
>> +** neg z0\.d, p0/m, z0\.d
>> ** ret
>> */
>> TEST_UNIFORM_Z (mul_m1_u64_m, svuint64_t,
>> z0 = svmul_n_u64_m (p0, z0, -1),
>> z0 = svmul_m (p0, z0, -1))
>> 
>> +/*
>> +** mul_m1r_u64_m:
>> +** mov (z[0-9]+)\.b, #-1
>> +** mov (z[0-9]+\.d), z0\.d
>> +** movprfx z0, \1
>> +** neg z0\.d, p0/m, \2
>> +** ret
>> +*/
>> +TEST_UNIFORM_Z (mul_m1r_u64_m, svuint64_t,
>> + z0 = svmul_u64_m (p0, svdup_u64 (-1), z0),
>> + z0 = svmul_m (p0, svdup_u64 (-1), z0))
>> +
>> /*
>> ** mul_u64_z_tied1:
>> ** movprfx z0\.d, p0/z, z0\.d
>> @@ -597,13 +608,22 @@ TEST_UNIFORM_Z (mul_255_u64_x, svuint64_t,
>> 
>> /*
>> ** mul_m1_u64_x:
>> -** mul z0\.d, z0\.d, #-1
>> +** neg z0\.d, p0/m, z0\.d
>> ** ret
>> */
>> TEST_UNIFORM_Z (mul_m1_u64_x, svuint64_t,
>> z0 = svmul_n_u64_x (p0, z0, -1),
>> z0 = svmul_x (p0, z0, -1))
>> 
>> +/*
>> +** mul_m1r_u64_x:
>> +** neg z0\.d, p0/m, z0\.d
>> +** ret
>> +*/
>> +TEST_UNIFORM_Z (mul_m1r_u64_x, svuint64_t,
>> + z0 = svmul_u64_x (p0, svdup_u64 (-1), z0),
>> + z0 = svmul_x (p0, svdup_u64 (-1), z0))
>> +
>> /*
>> ** mul_m127_u64_x:
>> ** mul z0\.d, z0\.d, #-127
>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c 
>> b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c
>> index 06ee1b3e7c8..8e53a4821f0 100644
>> --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c
>> @@ -174,8 +174,7 @@ TEST_UNIFORM_Z (mul_3_u8_m_untied, svuint8_t,
>> 
>> /*
>> ** mul_m1_u8_m:
>> -** mov (z[0-9]+)\.b, #-1
>> -** mul z0\.b, p0/m, z0\.b, \1\.b
>> +** neg z0\.b, p0/m, z0\.b
>> ** ret
>> */
>> TEST_UNIFORM_Z (mul_m1_u8_m, svuint8_t,
>> @@ -559,7 +558,7 @@ TEST_UNIFORM_Z (mul_128_u8_x, svuint8_t,
>> 
>> /*
>> ** mul_255_u8_x:
>> -** mul z0\.b, z0\.b, #-1
>> +** neg z0\.b, p0/m, z0\.b
>> ** ret
>> */
>> TEST_UNIFORM_Z (mul_255_u8_x, svuint8_t,
>> @@ -568,7 +567,7 @@ TEST_UNIFORM_Z (mul_255_u8_x, svuint8_t,
>> 
>> /*
>> ** mul_m1_u8_x:
>> -** mul z0\.b, z0\.b, #-1
>> +** neg z0\.b, p0/m, z0\.b
>> ** ret
>> */
>> TEST_UNIFORM_Z (mul_m1_u8_x, svuint8_t,
>> -- 
>> 2.44.0
>> 
>>> 
>>> Richard
> 
>
smime.p7s
Description: S/MIME cryptographic signature
Re: [PATCH] SVE intrinsics: Fold svmul and svdiv by -1 to svneg for unsigned types

Reply via email to