Victor Do Nascimento <victor.donascime...@arm.com> writes: > Given recent changes to the dot_prod standard pattern name, this patch > fixes the aarch64 back-end by implementing the following changes: > > 1. Add 2nd mode to all (u|s|us)dot_prod patterns in .md files. > 2. Rewrite initialization and function expansion mechanism for simd > builtins. > 3. Fix all direct calls to back-end `dot_prod' patterns in SVE > builtins. > > Finally, given that it is now possible for the compiler to > differentiate between the two- and four-way dot product, we add a test > to ensure that autovectorization picks up on dot-product patterns > where the result is twice the width of the operands. > > gcc/ChangeLog: > > * config/aarch64/aarch64-simd.md > (<sur>dot_prod<vsi2qi><vczle><vczbe>): Renamed to... > (<sur>dot_prod<mode><vsi2qi><vczle><vczbe>): ...this. > (usdot_prod<vsi2qi><vczle><vczbe>): Renamed to... > (usdot_prod<mode><vsi2qi><vczle><vczbe>): ...this. > (<su>sadv16qi): Adjust call to gen_udot_prod take second mode. > (popcount<mode2>): fix use of `udot_prod_optab'. > * gcc/config/aarch64/aarch64-sve.md > (<sur>dot_prod<vsi2qi>): Renamed to... > (<sur>dot_prod<mode><vsi2qi>): ...this. > (@<sur>dot_prod<vsi2qi>): Renamed to... > (@<sur>dot_prod<mode><vsi2qi>): ...this. > (<su>sad<vsi2qi>): Adjust call to gen_udot_prod take second mode. > * gcc/config/aarch64/aarch64-sve2.md > (@aarch64_sve_<sur>dotvnx4sivnx8hi): Renamed to... > (<sur>dot_prodvnx4sivnx8hi): ...this. > * config/aarch64/aarch64-simd-builtins.def: Modify macro > expansion-based initialization and expansion > of (u|s|us)dot_prod builtins. > * config/aarch64/aarch64-sve-builtins-base.cc > (svdot_impl::expand): s/direct/convert/ in > `convert_optab_handler_for_sign' function call. > (svusdot_impl::expand): add second mode argument in call to > `code_for_dot_prod'. > * config/aarch64/aarch64-sve-builtins.cc > (function_expander::convert_optab_handler_for_sign): New class > method. > * config/aarch64/aarch64-sve-builtins.h > (class function_expander): Add prototype for new > `convert_optab_handler_for_sign' method. > > gcc/testsuite/ChangeLog: > * gcc.target/aarch64/sme/vect-dotprod-twoway.c (udot2): New.
Could you run the patch through contrib/check_GNU_style.py to catch the long lines? > --- > gcc/config/aarch64/aarch64-builtins.cc | 7 ++++++ > gcc/config/aarch64/aarch64-simd-builtins.def | 6 ++--- > gcc/config/aarch64/aarch64-simd.md | 9 ++++--- > .../aarch64/aarch64-sve-builtins-base.cc | 13 +++++----- > gcc/config/aarch64/aarch64-sve-builtins.cc | 17 +++++++++++++ > gcc/config/aarch64/aarch64-sve-builtins.h | 3 +++ > gcc/config/aarch64/aarch64-sve.md | 6 ++--- > gcc/config/aarch64/aarch64-sve2.md | 2 +- > .../aarch64/sme/vect-dotprod-twoway.c | 25 +++++++++++++++++++ > 9 files changed, 71 insertions(+), 17 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/vect-dotprod-twoway.c > [...] > diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc > b/gcc/config/aarch64/aarch64-sve-builtins.cc > index 0a560eaedca..975eca0bbd6 100644 > --- a/gcc/config/aarch64/aarch64-sve-builtins.cc > +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc > @@ -3745,6 +3745,23 @@ function_expander::direct_optab_handler_for_sign > (optab signed_op, > return ::direct_optab_handler (op, mode); > } > > +/* Choose between signed and unsigned convert optabs SIGNED_OP and > + UNSIGNED_OP based on the signedness of type suffix SUFFIX_I, then > + pick the appropriate optab handler for the mode. Use MODE as the > + mode if given, otherwise use the mode of type suffix SUFFIX_I. */ The last sentence needs to be adapted for this function. Also, because there is no longer a single mode, I don't think it makes sense to allow a default. So how about: /* Choose between signed and unsigned convert optabs SIGNED_OP and UNSIGNED_OP based on the signedness of type suffix SUFFIX_I, then pick the appropriate optab handler for "converting" from FROM_MODE to TO_MODE. */ > +insn_code > +function_expander::convert_optab_handler_for_sign (optab signed_op, > + optab unsigned_op, > + unsigned int suffix_i, > + machine_mode to_mode, > + machine_mode from_mode) > +{ > + if (from_mode == VOIDmode) > + from_mode = vector_mode (suffix_i); This code would then be removed. > + optab op = type_suffix (suffix_i).unsigned_p ? unsigned_op : signed_op; > + return ::convert_optab_handler (op, to_mode, from_mode); > +} > + > /* Return true if X overlaps any input. */ > bool > function_expander::overlaps_input_p (rtx x) > diff --git a/gcc/config/aarch64/aarch64-sve-builtins.h > b/gcc/config/aarch64/aarch64-sve-builtins.h > index 9ab6f202c30..7534a58c3d7 100644 > --- a/gcc/config/aarch64/aarch64-sve-builtins.h > +++ b/gcc/config/aarch64/aarch64-sve-builtins.h > @@ -659,6 +659,9 @@ public: > insn_code direct_optab_handler (optab, unsigned int = 0); > insn_code direct_optab_handler_for_sign (optab, optab, unsigned int = 0, > machine_mode = E_VOIDmode); > + insn_code convert_optab_handler_for_sign (optab, optab, unsigned int = 0, > + machine_mode = E_VOIDmode, > + machine_mode = E_VOIDmode); and the "= E_VOIDmode"s here too. > machine_mode result_mode () const; > > [...] > diff --git a/gcc/testsuite/gcc.target/aarch64/sme/vect-dotprod-twoway.c > b/gcc/testsuite/gcc.target/aarch64/sme/vect-dotprod-twoway.c > new file mode 100644 > index 00000000000..453f3a75e6f > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sme/vect-dotprod-twoway.c > @@ -0,0 +1,25 @@ > +/* { dg-additional-options "-march=armv9.2-a+sme2 -O2 -ftree-vectorize" } */ Could you remove the -march option in favour of: #pragma GCC target "+sme2" ? That way, we honour the user's test flags if they already include sme2. LGTM otherwise, thanks. Richard > + > +#include <stdint.h> > + > +uint32_t udot2(int n, uint16_t* data) __arm_streaming > +{ > + uint32_t sum = 0; > + for (int i=0; i<n; i+=1) { > + sum += data[i] * data[i]; > + } > + return sum; > +} > + > +int32_t sdot2(int n, int16_t* data) __arm_streaming > +{ > + int32_t sum = 0; > + for (int i=0; i<n; i+=1) { > + sum += data[i] * data[i]; > + } > + return sum; > +} > + > +/* { dg-final { scan-assembler-times {\tudot\tz[0-9]+\.s, z[0-9]+\.h, > z[0-9]+\.h\n} 5 } } */ > +/* { dg-final { scan-assembler-times {\tsdot\tz[0-9]+\.s, z[0-9]+\.h, > z[0-9]+\.h\n} 5 } } */ > +/* { dg-final { scan-assembler-times {\twhilelo\t} 4 } } */