Re: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer

Richard Biener Fri, 17 May 2024 03:13:29 -0700

On Fri, May 17, 2024 at 11:56 AM Tamar Christina
<tamar.christ...@arm.com> wrote:
>
> > -----Original Message-----
> > From: Richard Biener <richard.guent...@gmail.com>
> > Sent: Friday, May 17, 2024 10:46 AM
> > To: Tamar Christina <tamar.christ...@arm.com>
> > Cc: Victor Do Nascimento <victor.donascime...@arm.com>; gcc-
> > patc...@gcc.gnu.org; Richard Sandiford <richard.sandif...@arm.com>; Richard
> > Earnshaw <richard.earns...@arm.com>; Victor Do Nascimento
> > <vicdo...@e133397.arm.com>
> > Subject: Re: [PATCH] middle-end: Expand {u|s}dot product support in
> > autovectorizer
> >
> > On Fri, May 17, 2024 at 11:05 AM Tamar Christina
> > <tamar.christ...@arm.com> wrote:
> > >
> > > > -----Original Message-----
> > > > From: Richard Biener <richard.guent...@gmail.com>
> > > > Sent: Friday, May 17, 2024 6:51 AM
> > > > To: Victor Do Nascimento <victor.donascime...@arm.com>
> > > > Cc: gcc-patches@gcc.gnu.org; Richard Sandiford
> > <richard.sandif...@arm.com>;
> > > > Richard Earnshaw <richard.earns...@arm.com>; Victor Do Nascimento
> > > > <vicdo...@e133397.arm.com>
> > > > Subject: Re: [PATCH] middle-end: Expand {u|s}dot product support in
> > > > autovectorizer
> > > >
> > > > On Thu, May 16, 2024 at 4:40 PM Victor Do Nascimento
> > > > <victor.donascime...@arm.com> wrote:
> > > > >
> > > > > From: Victor Do Nascimento <vicdo...@e133397.arm.com>
> > > > >
> > > > > At present, the compiler offers the `{u|s|us}dot_prod_optab' direct
> > > > > optabs for dealing with vectorizable dot product code sequences.  The
> > > > > consequence of using a direct optab for this is that backend-pattern
> > > > > selection is only ever able to match against one datatype - Either
> > > > > that of the operands or of the accumulated value, never both.
> > > > >
> > > > > With the introduction of the 2-way (un)signed dot-product insn [1][2]
> > > > > in AArch64 SVE2, the existing direct opcode approach is no longer
> > > > > sufficient for full specification of all the possible dot product
> > > > > machine instructions to be matched to the code sequence; a dot product
> > > > > resulting in VNx4SI may result from either dot products on VNx16QI or
> > > > > VNx8HI values for the 4- and 2-way dot product operations, 
> > > > > respectively.
> > > > >
> > > > > This means that the following example fails autovectorization:
> > > > >
> > > > > uint32_t foo(int n, uint16_t* data) {
> > > > >   uint32_t sum = 0;
> > > > >   for (int i=0; i<n; i+=1) {
> > > > >     sum += data[i] * data[i];
> > > > >   }
> > > > >   return sum;
> > > > > }
> > > > >
> > > > > To remedy the issue a new optab is added, tentatively named
> > > > > `udot_prod_twoway_optab', whose selection is dependent upon checking
> > > > > of both input and output types involved in the operation.
> > > >
> > > > I don't like this too much.  I'll note we document dot_prod as
> > > >
> > > > @cindex @code{sdot_prod@var{m}} instruction pattern
> > > > @item @samp{sdot_prod@var{m}}
> > > >
> > > > Compute the sum of the products of two signed elements.
> > > > Operand 1 and operand 2 are of the same mode. Their
> > > > product, which is of a wider mode, is computed and added to operand 3.
> > > > Operand 3 is of a mode equal or wider than the mode of the product. The
> > > > result is placed in operand 0, which is of the same mode as operand 3.
> > > > @var{m} is the mode of operand 1 and operand 2.
> > > >
> > > > with no restriction on the wider mode but we don't specify it which is
> > > > bad design.  This should have been a convert optab with two modes
> > > > from the start - adding a _twoway variant is just a hack.
> > >
> > > We did discuss this at the time we started implementing it.  There was two
> > > options, one was indeed to change it to a convert dot_prod optab, but 
> > > doing
> > > this means we have to update every target that uses it.
> > >
> > > Now that means 3 ISAs for AArch64, Arm, Arc, c6x, 2 for x86, loongson and
> > altivec.
> > >
> > > Which sure could be possible, but there's also every use in the backends 
> > > that
> > need
> > > to be updated, and tested, which for some targets we don't even know how 
> > > to
> > begin.
> > >
> > > So it seems very hard to correct dotprod to a convert optab now.
> >
> > It's still the correct way to go.  At _least_ your new pattern should
> > have been this,
> > otherwise what do you do when you have two-way, four-way and eight-way
> > variants?
> > Add yet another optab?
>
> I guess that's fair, but having the new optab only be convert resulted in 
> messy
> code as everywhere you must check for both variants.
>
> Additionally that optab would then overlap with the existing optabs as, as you
> Say, the documentation only says it's of a wider type and doesn't indicate
> precision.
>
> So to avoid issues down the line then If the new optab isn't acceptable then
> we'll have to do a wholesale conversion then..


Yep.  It shouldn't be difficult though.

> >
> > Another thing is that when you do it your way you should fix the existing 
> > optab
> > to be two-way by documenting how the second mode derives from the first.
> >
> > And sure, it's not the only optab suffering from this issue.
>
> Sure, all the zero and sign extending optabs for instance 😊

But for example the scalar ones are correct:

OPTAB_CL(sext_optab, "extend$b$a2", SIGN_EXTEND, "extend",
gen_extend_conv_libfunc)

Richard.

> Tamar
>
> >
> > Richard.
> >
> > > Tamar
> > >
> > > >
> > > > Richard.
> > > >
> > > > > In order to minimize changes to the existing codebase,
> > > > > `optab_for_tree_code' is renamed `optab_for_tree_code_1' and a new
> > > > > argument is added to its signature - `const_tree otype', allowing type
> > > > > information to be specified for both input and output types.  The
> > > > > existing nterface is retained by defining a new `optab_for_tree_code',
> > > > > which serves as a shim to `optab_for_tree_code_1', passing old
> > > > > parameters as-is and setting the new `optype' argument to `NULL_TREE'.
> > > > >
> > > > > For DOT_PROD_EXPR tree codes, we can call `optab_for_tree_code_1'
> > > > > directly, passing it both types, adding the internal logic to the
> > > > > function to distinguish between competing optabs.
> > > > >
> > > > > Finally, necessary changes are made to `expand_widen_pattern_expr' to
> > > > > ensure the new icode can be correctly selected, given the new optab.
> > > > >
> > > > > [1] https://developer.arm.com/documentation/ddi0602/2024-03/SVE-
> > > > Instructions/UDOT--2-way--vectors---Unsigned-integer-dot-product-
> > > > > [2] https://developer.arm.com/documentation/ddi0602/2024-03/SVE-
> > > > Instructions/SDOT--2-way--vectors---Signed-integer-dot-product-
> > > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > >         * config/aarch64/aarch64-sve2.md
> > (@aarch64_sve_<sur>dotvnx4sivnx8hi):
> > > > >         renamed to `<sur>dot_prod_twoway_vnx8hi'.
> > > > >         * config/aarch64/aarch64-sve-builtins-base.cc 
> > > > > (svdot_impl.expand):
> > > > >         update icodes used in line with above rename.
> > > > >         * optabs-tree.cc (optab_for_tree_code_1): Renamed
> > > > >         `optab_for_tree_code' and added new argument.
> > > > >         (optab_for_tree_code): Now a call to `optab_for_tree_code_1'.
> > > > >         * optabs-tree.h (optab_for_tree_code_1): New.
> > > > >         * optabs.cc (expand_widen_pattern_expr): Expand support for
> > > > >         DOT_PROD_EXPR patterns.
> > > > >         * optabs.def (udot_prod_twoway_optab): New.
> > > > >         (sdot_prod_twoway_optab): Likewise.
> > > > >         * tree-vect-patterns.cc (vect_supportable_direct_optab_p): Add
> > > > >         support for misc optabs that use two modes.
> > > > >
> > > > > gcc/testsuite/ChangeLog:
> > > > >
> > > > >         * gcc.dg/vect/vect-dotprod-twoway.c: New.
> > > > > ---
> > > > >  .../aarch64/aarch64-sve-builtins-base.cc      |  4 ++--
> > > > >  gcc/config/aarch64/aarch64-sve2.md            |  2 +-
> > > > >  gcc/optabs-tree.cc                            | 23 ++++++++++++++++--
> > > > >  gcc/optabs-tree.h                             |  2 ++
> > > > >  gcc/optabs.cc                                 |  2 +-
> > > > >  gcc/optabs.def                                |  2 ++
> > > > >  .../gcc.dg/vect/vect-dotprod-twoway.c         | 24 
> > > > > +++++++++++++++++++
> > > > >  gcc/tree-vect-patterns.cc                     |  2 +-
> > > > >  8 files changed, 54 insertions(+), 7 deletions(-)
> > > > >  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
> > > > >
> > > > > diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> > > > b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> > > > > index 0d2edf3f19e..e457db09f66 100644
> > > > > --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> > > > > +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> > > > > @@ -764,8 +764,8 @@ public:
> > > > >        icode = (e.type_suffix (0).float_p
> > > > >                ? CODE_FOR_aarch64_sve_fdotvnx4sfvnx8hf
> > > > >                : e.type_suffix (0).unsigned_p
> > > > > -              ? CODE_FOR_aarch64_sve_udotvnx4sivnx8hi
> > > > > -              : CODE_FOR_aarch64_sve_sdotvnx4sivnx8hi);
> > > > > +              ? CODE_FOR_udot_prod_twoway_vnx8hi
> > > > > +              : CODE_FOR_sdot_prod_twoway_vnx8hi);
> > > > >      return e.use_unpred_insn (icode);
> > > > >    }
> > > > >  };
> > > > > diff --git a/gcc/config/aarch64/aarch64-sve2.md
> > > > b/gcc/config/aarch64/aarch64-sve2.md
> > > > > index 934e57055d3..5677de7108d 100644
> > > > > --- a/gcc/config/aarch64/aarch64-sve2.md
> > > > > +++ b/gcc/config/aarch64/aarch64-sve2.md
> > > > > @@ -2021,7 +2021,7 @@ (define_insn
> > > > "@aarch64_sve_qsub_<sve_int_op>_lane_<mode>"
> > > > >  )
> > > > >
> > > > >  ;; Two-way dot-product.
> > > > > -(define_insn "@aarch64_sve_<sur>dotvnx4sivnx8hi"
> > > > > +(define_insn "<sur>dot_prod_twoway_vnx8hi"
> > > > >    [(set (match_operand:VNx4SI 0 "register_operand")
> > > > >         (plus:VNx4SI
> > > > >           (unspec:VNx4SI
> > > > > diff --git a/gcc/optabs-tree.cc b/gcc/optabs-tree.cc
> > > > > index b69a5bc3676..e3c5a618ea2 100644
> > > > > --- a/gcc/optabs-tree.cc
> > > > > +++ b/gcc/optabs-tree.cc
> > > > > @@ -35,8 +35,8 @@ along with GCC; see the file COPYING3.  If not see
> > > > >     cannot give complete results for multiplication or division) but 
> > > > > probably
> > > > >     ought to be relied on more widely throughout the expander.  */
> > > > >  optab
> > > > > -optab_for_tree_code (enum tree_code code, const_tree type,
> > > > > -                    enum optab_subtype subtype)
> > > > > +optab_for_tree_code_1 (enum tree_code code, const_tree type,
> > > > > +                      const_tree otype, enum optab_subtype subtype)
> > > > >  {
> > > > >    bool trapv;
> > > > >    switch (code)
> > > > > @@ -149,6 +149,14 @@ optab_for_tree_code (enum tree_code code,
> > > > const_tree type,
> > > > >
> > > > >      case DOT_PROD_EXPR:
> > > > >        {
> > > > > +       if (otype && (TYPE_PRECISION (TREE_TYPE (type)) * 2
> > > > > +                     == TYPE_PRECISION (TREE_TYPE (otype))))
> > > > > +         {
> > > > > +           if (TYPE_UNSIGNED (type) && TYPE_UNSIGNED (otype))
> > > > > +             return udot_prod_twoway_optab;
> > > > > +           if (!TYPE_UNSIGNED (type) && !TYPE_UNSIGNED (otype))
> > > > > +             return sdot_prod_twoway_optab;
> > > > > +         }
> > > > >         if (subtype == optab_vector_mixed_sign)
> > > > >           return usdot_prod_optab;
> > > > >
> > > > > @@ -285,6 +293,17 @@ optab_for_tree_code (enum tree_code code,
> > > > const_tree type,
> > > > >      }
> > > > >  }
> > > > >
> > > > > +/* Return the optab used for computing the operation given by the 
> > > > > tree
> > code,
> > > > > +   CODE and the tree EXP.  This function is not always usable (for 
> > > > > example, it
> > > > > +   cannot give complete results for multiplication or division) but 
> > > > > probably
> > > > > +   ought to be relied on more widely throughout the expander.  */
> > > > > +optab
> > > > > +optab_for_tree_code (enum tree_code code, const_tree type,
> > > > > +                    enum optab_subtype subtype)
> > > > > +{
> > > > > +  return optab_for_tree_code_1 (code, type, NULL_TREE, subtype);
> > > > > +}
> > > > > +
> > > > >  /* Check whether an operation represented by CODE is a 'half' 
> > > > > widening
> > > > operation
> > > > >     in which the input vector type has half the number of bits of the 
> > > > > output
> > > > >     vector type e.g. V8QI->V8HI.
> > > > > diff --git a/gcc/optabs-tree.h b/gcc/optabs-tree.h
> > > > > index f2b49991462..13ff7ca2e4b 100644
> > > > > --- a/gcc/optabs-tree.h
> > > > > +++ b/gcc/optabs-tree.h
> > > > > @@ -36,6 +36,8 @@ enum optab_subtype
> > > > >  /* Return the optab used for computing the given operation on the 
> > > > > type
> > given
> > > > by
> > > > >     the second argument.  The third argument distinguishes between 
> > > > > the types
> > of
> > > > >     vector shifts and rotates.  */
> > > > > +optab optab_for_tree_code_1 (enum tree_code, const_tree, const_tree
> > > > __attribute__((unused)),
> > > > > +                           enum optab_subtype );
> > > > >  optab optab_for_tree_code (enum tree_code, const_tree, enum
> > > > optab_subtype);
> > > > >  bool
> > > > >  supportable_half_widening_operation (enum tree_code, tree, tree,
> > > > > diff --git a/gcc/optabs.cc b/gcc/optabs.cc
> > > > > index ce91f94ed43..3a1c6c7b90e 100644
> > > > > --- a/gcc/optabs.cc
> > > > > +++ b/gcc/optabs.cc
> > > > > @@ -311,7 +311,7 @@ expand_widen_pattern_expr (sepops ops, rtx op0,
> > rtx
> > > > op1, rtx wide_op,
> > > > >         gcc_unreachable ();
> > > > >
> > > > >        widen_pattern_optab
> > > > > -       = optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), 
> > > > > subtype);
> > > > > +       = optab_for_tree_code_1 (ops->code, TREE_TYPE (oprnd0), 
> > > > > TREE_TYPE
> > > > (oprnd2), subtype);
> > > > >      }
> > > > >    else
> > > > >      widen_pattern_optab
> > > > > diff --git a/gcc/optabs.def b/gcc/optabs.def
> > > > > index ad14f9328b9..cf1a6e7a7dc 100644
> > > > > --- a/gcc/optabs.def
> > > > > +++ b/gcc/optabs.def
> > > > > @@ -408,6 +408,8 @@ OPTAB_D (sdot_prod_optab, "sdot_prod$I$a")
> > > > >  OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3")
> > > > >  OPTAB_D (udot_prod_optab, "udot_prod$I$a")
> > > > >  OPTAB_D (usdot_prod_optab, "usdot_prod$I$a")
> > > > > +OPTAB_D (sdot_prod_twoway_optab, "sdot_prod_twoway_$I$a")
> > > > > +OPTAB_D (udot_prod_twoway_optab, "udot_prod_twoway_$I$a")
> > > > >  OPTAB_D (usum_widen_optab, "widen_usum$I$a3")
> > > > >  OPTAB_D (usad_optab, "usad$I$a")
> > > > >  OPTAB_D (ssad_optab, "ssad$I$a")
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
> > > > b/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
> > > > > new file mode 100644
> > > > > index 00000000000..cba2aadbec8
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
> > > > > @@ -0,0 +1,24 @@
> > > > > +/* { dg-do compile { target { aarch64*-*-* } } } */
> > > > > +/* { dg-additional-options "-march=-O3 -march=armv9.2-a+sme2 -fdump-
> > tree-
> > > > vect-details" { target { aarch64*-*-* } } } */
> > > > > +
> > > > > +#include <stdint.h>
> > > > > +
> > > > > +uint32_t udot2(int n, uint16_t* data) {
> > > > > +  uint32_t sum = 0;
> > > > > +  for (int i=0; i<n; i+=1) {
> > > > > +    sum += data[i] * data[i];
> > > > > +  }
> > > > > +  return sum;
> > > > > +}
> > > > > +
> > > > > +int32_t sdot2(int n, int16_t* data) {
> > > > > +  int32_t sum = 0;
> > > > > +  for (int i=0; i<n; i+=1) {
> > > > > +    sum += data[i] * data[i];
> > > > > +  }
> > > > > +  return sum;
> > > > > +}
> > > > > +
> > > > > +/* { dg-final { scan-tree-dump-times "vect_recog_dot_prod_pattern:
> > detected:"
> > > > 2 "vect" } } */
> > > > > +/* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern:
> > > > detected" 4 "vect" } } */
> > > > > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" } 
> > > > > } */
> > > > > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> > > > > index dfb7d800526..0760f25d94d 100644
> > > > > --- a/gcc/tree-vect-patterns.cc
> > > > > +++ b/gcc/tree-vect-patterns.cc
> > > > > @@ -233,7 +233,7 @@ vect_supportable_direct_optab_p (vec_info *vinfo,
> > tree
> > > > otype, tree_code code,
> > > > >    if (!vecotype)
> > > > >      return false;
> > > > >
> > > > > -  optab optab = optab_for_tree_code (code, vecitype, subtype);
> > > > > +  optab optab = optab_for_tree_code_1 (code, vecitype, vecotype, 
> > > > > subtype);
> > > > >    if (!optab)
> > > > >      return false;
> > > > >
> > > > > --
> > > > > 2.34.1
> > > > >

Re: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer

Reply via email to