Victor Do Nascimento <victor.donascime...@arm.com> writes:
> Given the specification in the GCC internals manual defines the
> {u|s}dot_prod<m> standard name as taking "two signed elements of the
> same mode, adding them to a third operand of wider mode", there is
> currently ambiguity in the relationship between the mode of the first
> two arguments and that of the third.
>
> This vagueness means that, in theory, different modes may be
> supportable in the third argument.  This flexibility would allow for a
> given backend to add to the accumulator a different number of
> vectorized products, e.g. A backend may provide instructions for both:
>
>   accum += a[0] * b[0] + a[1] * b[1] + a[2] * b[2] + a[3] * b[3]
>
> and
>
>   accum += a[0] * b[0] + a[1] * b[1],
>
> as is now seen in the SVE2.1 extension to AArch64.  In spite of the
> aforementioned flexibility, modeling the dot-product operation as a
> direct optab means that we have no way to encode both input and the
> accumulator data modes into the backend pattern name, which prevents
> us from harnessing this flexibility.
>
> We therefore make all dot_prod optabs conversions, allowing, for
> example, for the encoding of both 2-way and 4-way dot product backend
> patterns.
>
> gcc/ChangeLog:
>
>       * optabs.def (sdot_prod_optab): Convert from OPTAB_D to
>       OPTAB_CD.
>       (udot_prod_optab): Likewise.
>       (usdot_prod_optab): Likewise.
>       * doc/md.texi (Standard Names): update entries for u,s and us
>       dot_prod names.
> ---
>  gcc/doc/md.texi | 46 +++++++++++++++++++++-------------------------
>  gcc/optabs.def  |  6 +++---
>  2 files changed, 24 insertions(+), 28 deletions(-)
>
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 5dc0d55edd6..aa1181a3320 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5760,15 +5760,14 @@ for (i = 0; i < LEN + BIAS; i++)
>      operand0 += operand2[i];
>  @end smallexample
>  
> -@cindex @code{sdot_prod@var{m}} instruction pattern
> -@item @samp{sdot_prod@var{m}}
> -
> -Compute the sum of the products of two signed elements.
> -Operand 1 and operand 2 are of the same mode. Their
> -product, which is of a wider mode, is computed and added to operand 3.
> -Operand 3 is of a mode equal or wider than the mode of the product. The
> -result is placed in operand 0, which is of the same mode as operand 3.
> -@var{m} is the mode of operand 1 and operand 2.
> +@cindex @code{sdot_prod@var{m}@var{n}} instruction pattern
> +@item @samp{sdot_prod@var{m}@var{n}}
> +
> +Multiply operand 1 by operand 2 without loss of precision, given that
> +both operands contain signed elements.  Add each product to the overlapping
> +element of operand 3 and store the result in operand 0.  Operands 0 and 3
> +have mode @var{m} and operands 1 and 2 have mode @var{n}, with @var{n}
> +having narrower elements than @var{m}.
>  
>  Semantically the expressions perform the multiplication in the following 
> signs
>  
> @@ -5778,15 +5777,14 @@ sdot<signed op0, signed op1, signed op2, signed op3> 
> ==
>  @dots{}
>  @end smallexample
>  
> -@cindex @code{udot_prod@var{m}} instruction pattern
> -@item @samp{udot_prod@var{m}}
> +@cindex @code{udot_prod@var{m}@var{n}} instruction pattern
> +@item @samp{udot_prod@var{m}@var{n}}
>  
> -Compute the sum of the products of two unsigned elements.
> -Operand 1 and operand 2 are of the same mode. Their
> -product, which is of a wider mode, is computed and added to operand 3.
> -Operand 3 is of a mode equal or wider than the mode of the product. The
> -result is placed in operand 0, which is of the same mode as operand 3.
> -@var{m} is the mode of operand 1 and operand 2.
> +Multiply operand 1 by operand 2 without loss of precision, given that
> +both operands contain unsigned elements.  Add each product to the overlapping
> +element of operand 3 and store the result in operand 0.  Operands 0 and 3
> +have mode @var{m} and operands 1 and 2 have mode @var{n}, with @var{n}
> +having narrower elements than @var{m}.
>  
>  Semantically the expressions perform the multiplication in the following 
> signs
>  
> @@ -5796,14 +5794,12 @@ udot<unsigned op0, unsigned op1, unsigned op2, 
> unsigned op3> ==
>  @dots{}
>  @end smallexample
>  
> -@cindex @code{usdot_prod@var{m}} instruction pattern
> -@item @samp{usdot_prod@var{m}}
> -Compute the sum of the products of elements of different signs.
> -Operand 1 must be unsigned and operand 2 signed. Their
> -product, which is of a wider mode, is computed and added to operand 3.
> -Operand 3 is of a mode equal or wider than the mode of the product. The
> -result is placed in operand 0, which is of the same mode as operand 3.
> -@var{m} is the mode of operand 1 and operand 2.
> +@cindex @code{usdot_prod@var{m}@var{n}} instruction pattern
> +@item @samp{usdot_prod@var{m}@var{n}}
> +Multiply operand 1 by operand 2.  Add each product to the overlapping

The new paragraph drops the information that operand 1 is unsigned and
operand 2 is signed.  Maybe change this sentence to:

  Multiply operand 1 by operand 2 without loss of precision, given that
  operand 1 is unsigned and operand 2 is signed.

OK with that change, thanks.

Richard

> +element of operand 3 and store the result in operand 0.  Operands 0 and 3
> +have mode @var{m} and operands 1 and 2 have mode @var{n}, with @var{n}
> +having narrower elements than @var{m}.
>  
>  Semantically the expressions perform the multiplication in the following 
> signs
>  
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index 58a939442bd..ba860144d8b 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -110,6 +110,9 @@ OPTAB_CD(mask_scatter_store_optab, 
> "mask_scatter_store$a$b")
>  OPTAB_CD(mask_len_scatter_store_optab, "mask_len_scatter_store$a$b")
>  OPTAB_CD(vec_extract_optab, "vec_extract$a$b")
>  OPTAB_CD(vec_init_optab, "vec_init$a$b")
> +OPTAB_CD (sdot_prod_optab, "sdot_prod$I$a$b")
> +OPTAB_CD (udot_prod_optab, "udot_prod$I$a$b")
> +OPTAB_CD (usdot_prod_optab, "usdot_prod$I$a$b")
>  
>  OPTAB_CD (while_ult_optab, "while_ult$a$b")
>  
> @@ -413,10 +416,7 @@ OPTAB_D (savg_floor_optab, "avg$a3_floor")
>  OPTAB_D (uavg_floor_optab, "uavg$a3_floor")
>  OPTAB_D (savg_ceil_optab, "avg$a3_ceil")
>  OPTAB_D (uavg_ceil_optab, "uavg$a3_ceil")
> -OPTAB_D (sdot_prod_optab, "sdot_prod$I$a")
>  OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3")
> -OPTAB_D (udot_prod_optab, "udot_prod$I$a")
> -OPTAB_D (usdot_prod_optab, "usdot_prod$I$a")
>  OPTAB_D (usum_widen_optab, "widen_usum$I$a3")
>  OPTAB_D (usad_optab, "usad$I$a")
>  OPTAB_D (ssad_optab, "ssad$I$a")

Reply via email to