Victor Do Nascimento <victor.donascime...@arm.com> writes:
> Given recent changes to the dot_prod standard pattern name, this patch
> fixes the aarch64 back-end by implementing the following changes:
>
> 1. Add 2nd mode to all (u|s|us)dot_prod patterns in .md files.
> 2. Rewrite initialization and function expansion mechanism for simd
> builtins.
> 3. Fix all direct calls to back-end `dot_prod' patterns in SVE
> builtins.
>
> Finally, given that it is now possible for the compiler to
> differentiate between the two- and four-way dot product, we add a test
> to ensure that autovectorization picks up on dot-product patterns
> where the result is twice the width of the operands.
>
> gcc/ChangeLog:
>
>       * config/aarch64/aarch64-simd.md
>       (<sur>dot_prod<vsi2qi><vczle><vczbe>): Renamed to...
>       (<sur>dot_prod<mode><vsi2qi><vczle><vczbe>): ...this.
>       (usdot_prod<vsi2qi><vczle><vczbe>): Renamed to...
>       (usdot_prod<mode><vsi2qi><vczle><vczbe>): ...this.
>       (<su>sadv16qi): Adjust call to gen_udot_prod take second mode.
>       (popcount<mode2>): fix use of `udot_prod_optab'.
>       * gcc/config/aarch64/aarch64-sve.md
>       (<sur>dot_prod<vsi2qi>): Renamed to...
>       (<sur>dot_prod<mode><vsi2qi>): ...this.
>       (@<sur>dot_prod<vsi2qi>): Renamed to...
>       (@<sur>dot_prod<mode><vsi2qi>): ...this.
>       (<su>sad<vsi2qi>): Adjust call to gen_udot_prod take second mode.
>       * gcc/config/aarch64/aarch64-sve2.md
>       (@aarch64_sve_<sur>dotvnx4sivnx8hi): Renamed to...
>       (<sur>dot_prodvnx4sivnx8hi): ...this.
>       * config/aarch64/aarch64-simd-builtins.def: Modify macro
>       expansion-based initialization and expansion
>       of (u|s|us)dot_prod builtins.
>       * config/aarch64/aarch64-sve-builtins-base.cc
>       (svdot_impl::expand): s/direct/convert/ in
>       `convert_optab_handler_for_sign' function call.
>       (svusdot_impl::expand): add second mode argument in call to
>       `code_for_dot_prod'.
>       * config/aarch64/aarch64-sve-builtins.cc
>       (function_expander::convert_optab_handler_for_sign): New class
>       method.
>       * config/aarch64/aarch64-sve-builtins.h
>       (class function_expander): Add prototype for new
>       `convert_optab_handler_for_sign' method.
>
> gcc/testsuite/ChangeLog:
>       * gcc.target/aarch64/sme/vect-dotprod-twoway.c (udot2): New.

Could you run the patch through contrib/check_GNU_style.py to catch
the long lines?

> ---
>  gcc/config/aarch64/aarch64-builtins.cc        |  7 ++++++
>  gcc/config/aarch64/aarch64-simd-builtins.def  |  6 ++---
>  gcc/config/aarch64/aarch64-simd.md            |  9 ++++---
>  .../aarch64/aarch64-sve-builtins-base.cc      | 13 +++++-----
>  gcc/config/aarch64/aarch64-sve-builtins.cc    | 17 +++++++++++++
>  gcc/config/aarch64/aarch64-sve-builtins.h     |  3 +++
>  gcc/config/aarch64/aarch64-sve.md             |  6 ++---
>  gcc/config/aarch64/aarch64-sve2.md            |  2 +-
>  .../aarch64/sme/vect-dotprod-twoway.c         | 25 +++++++++++++++++++
>  9 files changed, 71 insertions(+), 17 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/vect-dotprod-twoway.c
> [...]
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc 
> b/gcc/config/aarch64/aarch64-sve-builtins.cc
> index 0a560eaedca..975eca0bbd6 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
> @@ -3745,6 +3745,23 @@ function_expander::direct_optab_handler_for_sign 
> (optab signed_op,
>    return ::direct_optab_handler (op, mode);
>  }
>  
> +/* Choose between signed and unsigned convert optabs SIGNED_OP and
> +   UNSIGNED_OP based on the signedness of type suffix SUFFIX_I, then
> +   pick the appropriate optab handler for the mode.  Use MODE as the
> +   mode if given, otherwise use the mode of type suffix SUFFIX_I.  */

The last sentence needs to be adapted for this function.  Also, because
there is no longer a single mode, I don't think it makes sense to allow
a default.  So how about:

/* Choose between signed and unsigned convert optabs SIGNED_OP and
   UNSIGNED_OP based on the signedness of type suffix SUFFIX_I, then
   pick the appropriate optab handler for "converting" from FROM_MODE
   to TO_MODE.  */

> +insn_code
> +function_expander::convert_optab_handler_for_sign (optab signed_op,
> +                                                optab unsigned_op,
> +                                                unsigned int suffix_i,
> +                                                machine_mode to_mode,
> +                                                machine_mode from_mode)
> +{
> +  if (from_mode == VOIDmode)
> +    from_mode = vector_mode (suffix_i);

This code would then be removed.

> +  optab op = type_suffix (suffix_i).unsigned_p ? unsigned_op : signed_op;
> +  return ::convert_optab_handler (op, to_mode, from_mode);
> +}
> +
>  /* Return true if X overlaps any input.  */
>  bool
>  function_expander::overlaps_input_p (rtx x)
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins.h 
> b/gcc/config/aarch64/aarch64-sve-builtins.h
> index 9ab6f202c30..7534a58c3d7 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins.h
> +++ b/gcc/config/aarch64/aarch64-sve-builtins.h
> @@ -659,6 +659,9 @@ public:
>    insn_code direct_optab_handler (optab, unsigned int = 0);
>    insn_code direct_optab_handler_for_sign (optab, optab, unsigned int = 0,
>                                          machine_mode = E_VOIDmode);
> +  insn_code convert_optab_handler_for_sign (optab, optab, unsigned int = 0,
> +                                         machine_mode = E_VOIDmode,
> +                                         machine_mode = E_VOIDmode);

and the "= E_VOIDmode"s here too.

>    machine_mode result_mode () const;
>  
> [...]
> diff --git a/gcc/testsuite/gcc.target/aarch64/sme/vect-dotprod-twoway.c 
> b/gcc/testsuite/gcc.target/aarch64/sme/vect-dotprod-twoway.c
> new file mode 100644
> index 00000000000..453f3a75e6f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sme/vect-dotprod-twoway.c
> @@ -0,0 +1,25 @@
> +/* { dg-additional-options "-march=armv9.2-a+sme2 -O2 -ftree-vectorize" } */

Could you remove the -march option in favour of:

  #pragma GCC target "+sme2"

?  That way, we honour the user's test flags if they already include sme2.

LGTM otherwise, thanks.

Richard

> +
> +#include <stdint.h>
> +
> +uint32_t udot2(int n, uint16_t* data) __arm_streaming
> +{
> +  uint32_t sum = 0;
> +  for (int i=0; i<n; i+=1) {
> +    sum += data[i] * data[i];
> +  }
> +  return sum;
> +}
> +
> +int32_t sdot2(int n, int16_t* data) __arm_streaming
> +{
> +  int32_t sum = 0;
> +  for (int i=0; i<n; i+=1) {
> +    sum += data[i] * data[i];
> +  }
> +  return sum;
> +}
> +
> +/* { dg-final { scan-assembler-times {\tudot\tz[0-9]+\.s, z[0-9]+\.h, 
> z[0-9]+\.h\n} 5 } } */
> +/* { dg-final { scan-assembler-times {\tsdot\tz[0-9]+\.s, z[0-9]+\.h, 
> z[0-9]+\.h\n} 5 } } */
> +/* { dg-final { scan-assembler-times {\twhilelo\t} 4 } } */

Reply via email to