Ping. Thanks, Jennifer > On 26 Nov 2024, at 09:18, Jennifer Schmitz <jschm...@nvidia.com> wrote: > > > >> On 20 Nov 2024, at 13:43, Richard Sandiford <richard.sandif...@arm.com> >> wrote: >> >> External email: Use caution opening links or attachments >> >> >> Jennifer Schmitz <jschm...@nvidia.com> writes: >>>> On 13 Nov 2024, at 12:54, Richard Sandiford <richard.sandif...@arm.com> >>>> wrote: >>>> >>>> External email: Use caution opening links or attachments >>>> >>>> >>>> Jennifer Schmitz <jschm...@nvidia.com> writes: >>>>> As follow-up to >>>>> https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665472.html, >>>>> this patch implements folding of svmul and svdiv by -1 to svneg for >>>>> unsigned SVE vector types. The key idea is to reuse the existing code that >>>>> does this fold for signed types and feed it as callback to a helper >>>>> function >>>>> that adds the necessary type conversions. >>>> >>>> I only meant that we should do this for multiplication (since the sign >>>> isn't relevant for N-bit x N-bit -> N-bit multiplication). It wouldn't >>>> be right for unsigned division, since unsigned division by the maximum >>>> value is instead equivalent to x == MAX ? MAX : 0. >>>> >>>> Some comments on the multiplication bit below: >>>> >>>>> >>>>> For example, for the test case >>>>> svuint64_t foo (svuint64_t x, svbool_t pg) >>>>> { >>>>> return svmul_n_u64_x (pg, x, -1); >>>>> } >>>>> >>>>> the following gimple sequence is emitted (-O2 -mcpu=grace): >>>>> svuint64_t foo (svuint64_t x, svbool_t pg) >>>>> { >>>>> svuint64_t D.12921; >>>>> svint64_t D.12920; >>>>> svuint64_t D.12919; >>>>> >>>>> D.12920 = VIEW_CONVERT_EXPR<svint64_t>(x); >>>>> D.12921 = svneg_s64_x (pg, D.12920); >>>>> D.12919 = VIEW_CONVERT_EXPR<svuint64_t>(D.12921); >>>>> goto <D.12922>; >>>>> <D.12922>: >>>>> return D.12919; >>>>> } >>>>> >>>>> In general, the new helper gimple_folder::convert_and_fold >>>>> - takes a target type and a function pointer, >>>>> - converts all non-boolean vector types to the target type, >>>>> - replaces the converted arguments in the function call, >>>>> - calls the callback function, >>>>> - adds the necessary view converts to the gimple sequence, >>>>> - and returns the new call. >>>>> >>>>> Because all arguments are converted to the same target types, the helper >>>>> function is only suitable for folding calls whose arguments are all of >>>>> the same type. If necessary, this could be extended to convert the >>>>> arguments to different types differentially. >>>>> >>>>> The patch was bootstrapped and tested on aarch64-linux-gnu, no regression. >>>>> OK for mainline? >>>>> >>>>> Signed-off-by: Jennifer Schmitz <jschm...@nvidia.com> >>>>> >>>>> gcc/ChangeLog: >>>>> >>>>> * config/aarch64/aarch64-sve-builtins-base.cc >>>>> (svmul_impl::fold): Wrap code for folding to svneg in lambda >>>>> function and pass to gimple_folder::convert_and_fold to enable >>>>> the transform for unsigned types. >>>>> (svdiv_impl::fold): Likewise. >>>>> * config/aarch64/aarch64-sve-builtins.cc >>>>> (gimple_folder::convert_and_fold): New function that converts >>>>> operands to target type before calling callback function, adding the >>>>> necessary conversion statements. >>>>> * config/aarch64/aarch64-sve-builtins.h >>>>> (gimple_folder::convert_and_fold): Declare function. >>>>> (signed_type_suffix_index): Return type_suffix_index of signed >>>>> vector type for given width. >>>>> (function_instance::signed_type): Return signed vector type for >>>>> given width. >>>>> >>>>> gcc/testsuite/ChangeLog: >>>>> >>>>> * gcc.target/aarch64/sve/acle/asm/div_u32.c: Adjust expected >>>>> outcome. >>>>> * gcc.target/aarch64/sve/acle/asm/div_u64.c: Likewise. >>>>> * gcc.target/aarch64/sve/acle/asm/mul_u8.c: Likewise. >>>>> * gcc.target/aarch64/sve/acle/asm/mul_u16.c: Likewise. >>>>> * gcc.target/aarch64/sve/acle/asm/mul_u32.c: Likewise. >>>>> * gcc.target/aarch64/sve/acle/asm/mul_u64.c: New test and adjust >>>>> expected outcome. >>>>> --- >>>>> .../aarch64/aarch64-sve-builtins-base.cc | 99 ++++++++++++------- >>>>> gcc/config/aarch64/aarch64-sve-builtins.cc | 40 ++++++++ >>>>> gcc/config/aarch64/aarch64-sve-builtins.h | 30 ++++++ >>>>> .../gcc.target/aarch64/sve/acle/asm/div_u32.c | 9 ++ >>>>> .../gcc.target/aarch64/sve/acle/asm/div_u64.c | 9 ++ >>>>> .../gcc.target/aarch64/sve/acle/asm/mul_u16.c | 5 +- >>>>> .../gcc.target/aarch64/sve/acle/asm/mul_u32.c | 5 +- >>>>> .../gcc.target/aarch64/sve/acle/asm/mul_u64.c | 26 ++++- >>>>> .../gcc.target/aarch64/sve/acle/asm/mul_u8.c | 7 +- >>>>> 9 files changed, 180 insertions(+), 50 deletions(-) >>>>> >>>>> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc >>>>> b/gcc/config/aarch64/aarch64-sve-builtins-base.cc >>>>> index 1c9f515a52c..6df14a8f4c4 100644 >>>>> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc >>>>> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc >>>>> [...] >>>>> @@ -2082,33 +2091,49 @@ public: >>>>> return f.fold_active_lanes_to (build_zero_cst (TREE_TYPE (f.lhs))); >>>>> >>>>> /* If one of the operands is all integer -1, fold to svneg. */ >>>>> - tree pg = gimple_call_arg (f.call, 0); >>>>> - tree negated_op = NULL; >>>>> - if (integer_minus_onep (op2)) >>>>> - negated_op = op1; >>>>> - else if (integer_minus_onep (op1)) >>>>> - negated_op = op2; >>>>> - if (!f.type_suffix (0).unsigned_p && negated_op) >>>>> + if (integer_minus_onep (op1) || integer_minus_onep (op2)) >>>> >>>> Formatting nit, sorry, but: indentation looks off. >>>> >>>>> { >>>>> - function_instance instance ("svneg", functions::svneg, >>>>> - shapes::unary, MODE_none, >>>>> - f.type_suffix_ids, GROUP_none, f.pred); >>>>> - gcall *call = f.redirect_call (instance); >>>>> - unsigned offset_index = 0; >>>>> - if (f.pred == PRED_m) >>>>> + auto mul_by_m1 = [](gimple_folder &f) -> gcall * >>>>> { >>>>> - offset_index = 1; >>>>> - gimple_call_set_arg (call, 0, op1); >>>>> - } >>>>> - else >>>>> - gimple_set_num_ops (call, 5); >>>>> - gimple_call_set_arg (call, offset_index, pg); >>>>> - gimple_call_set_arg (call, offset_index + 1, negated_op); >>>>> - return call; >>>> >>>> Rather than having convert_and_fold modify the call in-place (which >>>> would create an incorrectly typed call if we leave the function itself >>>> unchanged), could we make it pass back a "tree" that contains the >>>> (possibly converted) lhs and a "vec<tree> &" that contains the >>>> (possibly converted) arguments? >>> Dear Richard, >>> I agree that it's preferable to not have convert_and_fold modify the call >>> in-place and wanted to ask for clarification about which sort of design you >>> had in mind: >>> Option 1: >>> tree gimple_folder::convert_and_fold (tree type, vec<tree> &args_conv), >>> such that the function takes a target type TYPE to which all non-boolean >>> vector operands are converted and a reference for a vec<tree> that it fills >>> with converted argument trees. It returns a tree with the (possibly >>> converted) lhs. It also adds the convert_view statements, but makes no >>> changes to the call itself. >>> The caller of the function uses the converted arguments and lhs to assemble >>> the new gcall. >>> >>> Option 2: >>> gcall *gimple_folder::convert_and_fold (tree type, gcall *(*fp) (tree, >>> vec<tree>, gimple_folder &), where the function converts the lhs and >>> arguments to TYPE and assigns them to the newly created tree lhs_conv and >>> vec<tree> args_conv that are passed to the function pointer FP. The >>> callback assembles the new call and returns it to convert_and_fold, which >>> adds the necessary conversion statements before returning the new call to >>> the caller. So, in this case it would be the callback modifying the call. >> >> Yeah, I meant option 2, but with vec<tree> & rather than plain vec<tree>. >> I suppose the callback should return a gimple * rather than a gcall * >> though, in case the callback wants to create a gassign instead. >> >> (Thanks for checking btw) > Thanks for clarifying. Below you find the updated patch. I made the following > changes: > - removed folding of svdiv and reverted the test cases > - pass lhs_conv and args_conv to callback instead of having convert_and_fold > change the call inplace > - re-validated, no regression. > Thanks, > Jennifer > > > As follow-up to > https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665472.html, > this patch implements folding of svmul by -1 to svneg for > unsigned SVE vector types. The key idea is to reuse the existing code that > does this fold for signed types and feed it as callback to a helper function > that adds the necessary type conversions. > > For example, for the test case > svuint64_t foo (svuint64_t x, svbool_t pg) > { > return svmul_n_u64_x (pg, x, -1); > } > > the following gimple sequence is emitted (-O2 -mcpu=grace): > svuint64_t foo (svuint64_t x, svbool_t pg) > { > svint64_t D.12921; > svint64_t D.12920; > svuint64_t D.12919; > > D.12920 = VIEW_CONVERT_EXPR<svint64_t>(x); > D.12921 = svneg_s64_x (pg, D.12920); > D.12919 = VIEW_CONVERT_EXPR<svuint64_t>(D.12921); > goto <D.12922>; > <D.12922>: > return D.12919; > } > > In general, the new helper gimple_folder::convert_and_fold > - takes a target type and a function pointer, > - converts the lhs and all non-boolean vector types to the target type, > - passes the converted lhs and arguments to the callback, > - receives the new gimple statement from the callback function, > - adds the necessary view converts to the gimple sequence, > - and returns the new call. > > Because all arguments are converted to the same target types, the helper > function is only suitable for folding calls whose arguments are all of > the same type. If necessary, this could be extended to convert the > arguments to different types differentially. > > The patch was bootstrapped and tested on aarch64-linux-gnu, no regression. > OK for mainline? > > Signed-off-by: Jennifer Schmitz <jschm...@nvidia.com> > > gcc/ChangeLog: > > * config/aarch64/aarch64-sve-builtins-base.cc > (svmul_impl::fold): Wrap code for folding to svneg in lambda > function and pass to gimple_folder::convert_and_fold to enable > the transform for unsigned types. > * config/aarch64/aarch64-sve-builtins.cc > (gimple_folder::convert_and_fold): New function that converts > operands to target type before calling callback function, adding the > necessary conversion statements. > * config/aarch64/aarch64-sve-builtins.h > (gimple_folder::convert_and_fold): Declare function. > (signed_type_suffix_index): Return type_suffix_index of signed > vector type for given width. > (function_instance::signed_type): Return signed vector type for > given width. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/sve/acle/asm/mul_u8.c: Adjust expected outcome. > * gcc.target/aarch64/sve/acle/asm/mul_u16.c: Likewise. > * gcc.target/aarch64/sve/acle/asm/mul_u32.c: Likewise. > * gcc.target/aarch64/sve/acle/asm/mul_u64.c: New test and adjust > expected outcome. > --- > .../aarch64/aarch64-sve-builtins-base.cc | 70 +++++++++++++------ > gcc/config/aarch64/aarch64-sve-builtins.cc | 43 ++++++++++++ > gcc/config/aarch64/aarch64-sve-builtins.h | 31 ++++++++ > .../gcc.target/aarch64/sve/acle/asm/mul_u16.c | 5 +- > .../gcc.target/aarch64/sve/acle/asm/mul_u32.c | 5 +- > .../gcc.target/aarch64/sve/acle/asm/mul_u64.c | 26 ++++++- > .../gcc.target/aarch64/sve/acle/asm/mul_u8.c | 7 +- > 7 files changed, 153 insertions(+), 34 deletions(-) > > diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc > b/gcc/config/aarch64/aarch64-sve-builtins-base.cc > index 87e9909b55a..52401a8c57a 100644 > --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc > +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc > @@ -2092,33 +2092,61 @@ public: > return f.fold_active_lanes_to (build_zero_cst (TREE_TYPE (f.lhs))); > > /* If one of the operands is all integer -1, fold to svneg. */ > - tree pg = gimple_call_arg (f.call, 0); > - tree negated_op = NULL; > - if (integer_minus_onep (op2)) > - negated_op = op1; > - else if (integer_minus_onep (op1)) > - negated_op = op2; > - if (!f.type_suffix (0).unsigned_p && negated_op) > + if (integer_minus_onep (op1) || integer_minus_onep (op2)) > { > - function_instance instance ("svneg", functions::svneg, > - shapes::unary, MODE_none, > - f.type_suffix_ids, GROUP_none, f.pred); > - gcall *call = f.redirect_call (instance); > - unsigned offset_index = 0; > - if (f.pred == PRED_m) > + auto mul_by_m1 = [](gimple_folder &f, tree lhs_conv, > + vec<tree> &args_conv) -> gimple * > { > - offset_index = 1; > - gimple_call_set_arg (call, 0, op1); > - } > - else > - gimple_set_num_ops (call, 5); > - gimple_call_set_arg (call, offset_index, pg); > - gimple_call_set_arg (call, offset_index + 1, negated_op); > - return call; > + gcc_assert (lhs_conv && args_conv.length () == 3); > + tree pg = args_conv[0]; > + tree op1 = args_conv[1]; > + tree op2 = args_conv[2]; > + tree negated_op = op1; > + bool negate_op1 = true; > + if (integer_minus_onep (op1)) > + { > + negated_op = op2; > + negate_op1 = false; > + } > + type_suffix_pair signed_tsp = > + {signed_type_suffix_index (f.type_suffix (0).element_bits), > + f.type_suffix_ids[1]}; > + function_instance instance ("svneg", functions::svneg, > + shapes::unary, MODE_none, > + signed_tsp, GROUP_none, f.pred); > + gcall *call = f.redirect_call (instance); > + gimple_call_set_lhs (call, lhs_conv); > + unsigned offset = 0; > + tree fntype, op1_type = TREE_TYPE (op1); > + if (f.pred == PRED_m) > + { > + offset = 1; > + tree arg_types[3] = {op1_type, TREE_TYPE (pg), op1_type}; > + fntype = build_function_type_array (TREE_TYPE (lhs_conv), > + 3, arg_types); > + tree ty = f.signed_type (f.type_suffix (0).element_bits); > + tree inactive = negate_op1 ? op1 : build_minus_one_cst (ty); > + gimple_call_set_arg (call, 0, inactive); > + } > + else > + { > + gimple_set_num_ops (call, 5); > + tree arg_types[2] = {TREE_TYPE (pg), op1_type}; > + fntype = build_function_type_array (TREE_TYPE (lhs_conv), > + 2, arg_types); > + } > + gimple_call_set_fntype (call, fntype); > + gimple_call_set_arg (call, offset, pg); > + gimple_call_set_arg (call, offset + 1, negated_op); > + return call; > + }; > + tree ty = f.signed_type (f.type_suffix (0).element_bits); > + return f.convert_and_fold (ty, mul_by_m1); > } > > /* If one of the operands is a uniform power of 2, fold to a left shift > by immediate. */ > + tree pg = gimple_call_arg (f.call, 0); > tree op1_cst = uniform_integer_cst_p (op1); > tree op2_cst = uniform_integer_cst_p (op2); > tree shift_op1, shift_op2 = NULL; > diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc > b/gcc/config/aarch64/aarch64-sve-builtins.cc > index 0fec1cd439e..01b0da22c9b 100644 > --- a/gcc/config/aarch64/aarch64-sve-builtins.cc > +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc > @@ -3646,6 +3646,49 @@ gimple_folder::redirect_pred_x () > return redirect_call (instance); > } > > +/* Convert the lhs and all non-boolean vector-type operands to TYPE. > + Pass the converted variables to the callback FP, and finally convert the > + result back to the original type. Add the necessary conversion statements. > + Return the new call. */ > +gimple * > +gimple_folder::convert_and_fold (tree type, > + gimple *(*fp) (gimple_folder &, > + tree, vec<tree> &)) > +{ > + gcc_assert (VECTOR_TYPE_P (type) > + && TYPE_MODE (type) != VNx16BImode); > + tree old_ty = TREE_TYPE (lhs); > + gimple_seq stmts = NULL; > + tree lhs_conv, op, op_ty, t; > + gimple *g, *new_stmt; > + bool convert_lhs_p = !useless_type_conversion_p (type, old_ty); > + lhs_conv = convert_lhs_p ? create_tmp_var (type) : lhs; > + unsigned int num_args = gimple_call_num_args (call); > + vec<tree> args_conv = vNULL; > + args_conv.safe_grow (num_args); > + for (unsigned int i = 0; i < num_args; ++i) > + { > + op = gimple_call_arg (call, i); > + op_ty = TREE_TYPE (op); > + args_conv[i] = > + (VECTOR_TYPE_P (op_ty) > + && TREE_CODE (op) != VECTOR_CST > + && TYPE_MODE (op_ty) != VNx16BImode > + && !useless_type_conversion_p (op_ty, type)) > + ? gimple_build (&stmts, VIEW_CONVERT_EXPR, type, op) : op; > + } > + > + new_stmt = fp (*this, lhs_conv, args_conv); > + gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT); > + if (convert_lhs_p) > + { > + t = build1 (VIEW_CONVERT_EXPR, old_ty, lhs_conv); > + g = gimple_build_assign (lhs, VIEW_CONVERT_EXPR, t); > + gsi_insert_after (gsi, g, GSI_SAME_STMT); > + } > + return new_stmt; > +} > + > /* Fold the call to constant VAL. */ > gimple * > gimple_folder::fold_to_cstu (poly_uint64 val) > diff --git a/gcc/config/aarch64/aarch64-sve-builtins.h > b/gcc/config/aarch64/aarch64-sve-builtins.h > index 4094f8207f9..3e919e52e2c 100644 > --- a/gcc/config/aarch64/aarch64-sve-builtins.h > +++ b/gcc/config/aarch64/aarch64-sve-builtins.h > @@ -406,6 +406,7 @@ public: > tree scalar_type (unsigned int) const; > tree vector_type (unsigned int) const; > tree tuple_type (unsigned int) const; > + tree signed_type (unsigned int) const; > unsigned int elements_per_vq (unsigned int) const; > machine_mode vector_mode (unsigned int) const; > machine_mode tuple_mode (unsigned int) const; > @@ -632,6 +633,8 @@ public: > > gcall *redirect_call (const function_instance &); > gimple *redirect_pred_x (); > + gimple *convert_and_fold (tree, gimple *(*) (gimple_folder &, > + tree, vec<tree> &)); > > gimple *fold_to_cstu (poly_uint64); > gimple *fold_to_pfalse (); > @@ -864,6 +867,20 @@ find_type_suffix (type_class_index tclass, unsigned int > element_bits) > gcc_unreachable (); > } > > +/* Return the type suffix of the signed type of width ELEMENT_BITS. */ > +inline type_suffix_index > +signed_type_suffix_index (unsigned int element_bits) > +{ > + switch (element_bits) > + { > + case 8: return TYPE_SUFFIX_s8; > + case 16: return TYPE_SUFFIX_s16; > + case 32: return TYPE_SUFFIX_s32; > + case 64: return TYPE_SUFFIX_s64; > + } > + gcc_unreachable (); > +} > + > /* Return the single field in tuple type TYPE. */ > inline tree > tuple_type_field (tree type) > @@ -1029,6 +1046,20 @@ function_instance::tuple_type (unsigned int i) const > return acle_vector_types[num_vectors - 1][type_suffix (i).vector_type]; > } > > +/* Return the signed vector type of width ELEMENT_BITS. */ > +inline tree > +function_instance::signed_type (unsigned int element_bits) const > +{ > + switch (element_bits) > + { > + case 8: return acle_vector_types[0][VECTOR_TYPE_svint8_t]; > + case 16: return acle_vector_types[0][VECTOR_TYPE_svint16_t]; > + case 32: return acle_vector_types[0][VECTOR_TYPE_svint32_t]; > + case 64: return acle_vector_types[0][VECTOR_TYPE_svint64_t]; > + } > + gcc_unreachable (); > +} > + > /* Return the number of elements of type suffix I that fit within a > 128-bit block. */ > inline unsigned int > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u16.c > b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u16.c > index bdf6fcb98d6..e228dc5995d 100644 > --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u16.c > +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u16.c > @@ -174,8 +174,7 @@ TEST_UNIFORM_Z (mul_3_u16_m_untied, svuint16_t, > > /* > ** mul_m1_u16_m: > -** mov (z[0-9]+)\.b, #-1 > -** mul z0\.h, p0/m, z0\.h, \1\.h > +** neg z0\.h, p0/m, z0\.h > ** ret > */ > TEST_UNIFORM_Z (mul_m1_u16_m, svuint16_t, > @@ -569,7 +568,7 @@ TEST_UNIFORM_Z (mul_255_u16_x, svuint16_t, > > /* > ** mul_m1_u16_x: > -** mul z0\.h, z0\.h, #-1 > +** neg z0\.h, p0/m, z0\.h > ** ret > */ > TEST_UNIFORM_Z (mul_m1_u16_x, svuint16_t, > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u32.c > b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u32.c > index a61e85fa12d..e8f52c9d785 100644 > --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u32.c > +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u32.c > @@ -174,8 +174,7 @@ TEST_UNIFORM_Z (mul_3_u32_m_untied, svuint32_t, > > /* > ** mul_m1_u32_m: > -** mov (z[0-9]+)\.b, #-1 > -** mul z0\.s, p0/m, z0\.s, \1\.s > +** neg z0\.s, p0/m, z0\.s > ** ret > */ > TEST_UNIFORM_Z (mul_m1_u32_m, svuint32_t, > @@ -569,7 +568,7 @@ TEST_UNIFORM_Z (mul_255_u32_x, svuint32_t, > > /* > ** mul_m1_u32_x: > -** mul z0\.s, z0\.s, #-1 > +** neg z0\.s, p0/m, z0\.s > ** ret > */ > TEST_UNIFORM_Z (mul_m1_u32_x, svuint32_t, > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u64.c > b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u64.c > index eee1f8a0c99..2ccdc3642c5 100644 > --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u64.c > +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u64.c > @@ -183,14 +183,25 @@ TEST_UNIFORM_Z (mul_3_u64_m_untied, svuint64_t, > > /* > ** mul_m1_u64_m: > -** mov (z[0-9]+)\.b, #-1 > -** mul z0\.d, p0/m, z0\.d, \1\.d > +** neg z0\.d, p0/m, z0\.d > ** ret > */ > TEST_UNIFORM_Z (mul_m1_u64_m, svuint64_t, > z0 = svmul_n_u64_m (p0, z0, -1), > z0 = svmul_m (p0, z0, -1)) > > +/* > +** mul_m1r_u64_m: > +** mov (z[0-9]+)\.b, #-1 > +** mov (z[0-9]+\.d), z0\.d > +** movprfx z0, \1 > +** neg z0\.d, p0/m, \2 > +** ret > +*/ > +TEST_UNIFORM_Z (mul_m1r_u64_m, svuint64_t, > + z0 = svmul_u64_m (p0, svdup_u64 (-1), z0), > + z0 = svmul_m (p0, svdup_u64 (-1), z0)) > + > /* > ** mul_u64_z_tied1: > ** movprfx z0\.d, p0/z, z0\.d > @@ -597,13 +608,22 @@ TEST_UNIFORM_Z (mul_255_u64_x, svuint64_t, > > /* > ** mul_m1_u64_x: > -** mul z0\.d, z0\.d, #-1 > +** neg z0\.d, p0/m, z0\.d > ** ret > */ > TEST_UNIFORM_Z (mul_m1_u64_x, svuint64_t, > z0 = svmul_n_u64_x (p0, z0, -1), > z0 = svmul_x (p0, z0, -1)) > > +/* > +** mul_m1r_u64_x: > +** neg z0\.d, p0/m, z0\.d > +** ret > +*/ > +TEST_UNIFORM_Z (mul_m1r_u64_x, svuint64_t, > + z0 = svmul_u64_x (p0, svdup_u64 (-1), z0), > + z0 = svmul_x (p0, svdup_u64 (-1), z0)) > + > /* > ** mul_m127_u64_x: > ** mul z0\.d, z0\.d, #-127 > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c > b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c > index 06ee1b3e7c8..8e53a4821f0 100644 > --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c > +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c > @@ -174,8 +174,7 @@ TEST_UNIFORM_Z (mul_3_u8_m_untied, svuint8_t, > > /* > ** mul_m1_u8_m: > -** mov (z[0-9]+)\.b, #-1 > -** mul z0\.b, p0/m, z0\.b, \1\.b > +** neg z0\.b, p0/m, z0\.b > ** ret > */ > TEST_UNIFORM_Z (mul_m1_u8_m, svuint8_t, > @@ -559,7 +558,7 @@ TEST_UNIFORM_Z (mul_128_u8_x, svuint8_t, > > /* > ** mul_255_u8_x: > -** mul z0\.b, z0\.b, #-1 > +** neg z0\.b, p0/m, z0\.b > ** ret > */ > TEST_UNIFORM_Z (mul_255_u8_x, svuint8_t, > @@ -568,7 +567,7 @@ TEST_UNIFORM_Z (mul_255_u8_x, svuint8_t, > > /* > ** mul_m1_u8_x: > -** mul z0\.b, z0\.b, #-1 > +** neg z0\.b, p0/m, z0\.b > ** ret > */ > TEST_UNIFORM_Z (mul_m1_u8_x, svuint8_t, > -- > 2.44.0 > >> >> Richard
smime.p7s
Description: S/MIME cryptographic signature