Given the specification in the GCC internals manual defines the {u|s}dot_prod<m> standard name as taking "two signed elements of the same mode, adding them to a third operand of wider mode", there is currently ambiguity in the relationship between the mode of the first two arguments and that of the third.
This vagueness means that, in theory, different modes may be supportable in the third argument. This flexibility would allow for a given backend to add to the accumulator a different number of vectorized products, e.g. A backend may provide instructions for both: accum += a[0] * b[0] + a[1] * b[1] + a[2] * b[2] + a[3] * b[3] and accum += a[0] * b[0] + a[1] * b[1], as is now seen in the SVE2.1 extension to AArch64. In spite of the aforementioned flexibility, modeling the dot-product operation as a direct optab means that we have no way to encode both input and the accumulator data modes into the backend pattern name, which prevents us from harnessing this flexibility. We therefore make all dot_prod optabs conversions, allowing, for example, for the encoding of both 2-way and 4-way dot product backend patterns. gcc/ChangeLog: * optabs.def (sdot_prod_optab): Convert from OPTAB_D to OPTAB_CD. (udot_prod_optab): Likewise. (usdot_prod_optab): Likewise. * doc/md.texi (Standard Names): update entries for u,s and us dot_prod names. --- gcc/doc/md.texi | 18 +++++++++--------- gcc/optabs.def | 6 +++--- 2 files changed, 12 insertions(+), 12 deletions(-) diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index 7f4335e0aac..2a74e473f05 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -5748,15 +5748,15 @@ for (i = 0; i < LEN + BIAS; i++) operand0 += operand2[i]; @end smallexample -@cindex @code{sdot_prod@var{m}} instruction pattern -@item @samp{sdot_prod@var{m}} +@cindex @code{sdot_prod@var{m}@var{n}} instruction pattern +@item @samp{sdot_prod@var{m}@var{n}} Compute the sum of the products of two signed elements. Operand 1 and operand 2 are of the same mode. Their product, which is of a wider mode, is computed and added to operand 3. Operand 3 is of a mode equal or wider than the mode of the product. The result is placed in operand 0, which is of the same mode as operand 3. -@var{m} is the mode of operand 1 and operand 2. +@var{m} is the mode of operands 0 and 3 and @var{n} the mode of operands 1 and 2. Semantically the expressions perform the multiplication in the following signs @@ -5766,15 +5766,15 @@ sdot<signed op0, signed op1, signed op2, signed op3> == @dots{} @end smallexample -@cindex @code{udot_prod@var{m}} instruction pattern -@item @samp{udot_prod@var{m}} +@cindex @code{udot_prod@var{m}@var{n}} instruction pattern +@item @samp{udot_prod@var{m}@var{n}} Compute the sum of the products of two unsigned elements. Operand 1 and operand 2 are of the same mode. Their product, which is of a wider mode, is computed and added to operand 3. Operand 3 is of a mode equal or wider than the mode of the product. The result is placed in operand 0, which is of the same mode as operand 3. -@var{m} is the mode of operand 1 and operand 2. +@var{m} is the mode of operands 0 and 3 and @var{n} the mode of operands 1 and 2. Semantically the expressions perform the multiplication in the following signs @@ -5784,14 +5784,14 @@ udot<unsigned op0, unsigned op1, unsigned op2, unsigned op3> == @dots{} @end smallexample -@cindex @code{usdot_prod@var{m}} instruction pattern -@item @samp{usdot_prod@var{m}} +@cindex @code{usdot_prod@var{m}@var{n}} instruction pattern +@item @samp{usdot_prod@var{m}@var{n}} Compute the sum of the products of elements of different signs. Operand 1 must be unsigned and operand 2 signed. Their product, which is of a wider mode, is computed and added to operand 3. Operand 3 is of a mode equal or wider than the mode of the product. The result is placed in operand 0, which is of the same mode as operand 3. -@var{m} is the mode of operand 1 and operand 2. +@var{m} is the mode of operands 0 and 3 and @var{n} the mode of operands 1 and 2. Semantically the expressions perform the multiplication in the following signs diff --git a/gcc/optabs.def b/gcc/optabs.def index 45e117a7f50..fce4b2d5b08 100644 --- a/gcc/optabs.def +++ b/gcc/optabs.def @@ -106,6 +106,9 @@ OPTAB_CD(mask_scatter_store_optab, "mask_scatter_store$a$b") OPTAB_CD(mask_len_scatter_store_optab, "mask_len_scatter_store$a$b") OPTAB_CD(vec_extract_optab, "vec_extract$a$b") OPTAB_CD(vec_init_optab, "vec_init$a$b") +OPTAB_CD (sdot_prod_optab, "sdot_prod$I$a$b") +OPTAB_CD (udot_prod_optab, "udot_prod$I$a$b") +OPTAB_CD (usdot_prod_optab, "usdot_prod$I$a$b") OPTAB_CD (while_ult_optab, "while_ult$a$b") @@ -409,10 +412,7 @@ OPTAB_D (savg_floor_optab, "avg$a3_floor") OPTAB_D (uavg_floor_optab, "uavg$a3_floor") OPTAB_D (savg_ceil_optab, "avg$a3_ceil") OPTAB_D (uavg_ceil_optab, "uavg$a3_ceil") -OPTAB_D (sdot_prod_optab, "sdot_prod$I$a") OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3") -OPTAB_D (udot_prod_optab, "udot_prod$I$a") -OPTAB_D (usdot_prod_optab, "usdot_prod$I$a") OPTAB_D (usum_widen_optab, "widen_usum$I$a3") OPTAB_D (usad_optab, "usad$I$a") OPTAB_D (ssad_optab, "ssad$I$a") -- 2.34.1