Dear Richard and Tamar,
Thanks to the both of you for the various bits of feedback.
I've implemented all the more straightforward bits of feedback given,
leaving "only" the merging of the two- and four-way dot product optabs
into one, together with the necessary changes to the various backends
which, though a little time-consuming, should be rather mechanical.
I had originally implemented the new two-way dotprod optab as a covert
optab anyway, so going back to the work on that local branch will give
me a good starting point from which to do this.
And Tamar, thanks very much for the feedback regarding the unit-tests.
I knew my testing as it currently is was rather anaemic and was eager to
get the relevant feedback on it. Rest assured it's all been taken on board.
Cheers,
Victor
On 5/17/24 11:13, Richard Biener wrote:
On Fri, May 17, 2024 at 11:56 AM Tamar Christina
<tamar.christ...@arm.com> wrote:
-----Original Message-----
From: Richard Biener <richard.guent...@gmail.com>
Sent: Friday, May 17, 2024 10:46 AM
To: Tamar Christina <tamar.christ...@arm.com>
Cc: Victor Do Nascimento <victor.donascime...@arm.com>; gcc-
patc...@gcc.gnu.org; Richard Sandiford <richard.sandif...@arm.com>; Richard
Earnshaw <richard.earns...@arm.com>; Victor Do Nascimento
<vicdo...@e133397.arm.com>
Subject: Re: [PATCH] middle-end: Expand {u|s}dot product support in
autovectorizer
On Fri, May 17, 2024 at 11:05 AM Tamar Christina
<tamar.christ...@arm.com> wrote:
-----Original Message-----
From: Richard Biener <richard.guent...@gmail.com>
Sent: Friday, May 17, 2024 6:51 AM
To: Victor Do Nascimento <victor.donascime...@arm.com>
Cc: gcc-patches@gcc.gnu.org; Richard Sandiford
<richard.sandif...@arm.com>;
Richard Earnshaw <richard.earns...@arm.com>; Victor Do Nascimento
<vicdo...@e133397.arm.com>
Subject: Re: [PATCH] middle-end: Expand {u|s}dot product support in
autovectorizer
On Thu, May 16, 2024 at 4:40 PM Victor Do Nascimento
<victor.donascime...@arm.com> wrote:
From: Victor Do Nascimento <vicdo...@e133397.arm.com>
At present, the compiler offers the `{u|s|us}dot_prod_optab' direct
optabs for dealing with vectorizable dot product code sequences. The
consequence of using a direct optab for this is that backend-pattern
selection is only ever able to match against one datatype - Either
that of the operands or of the accumulated value, never both.
With the introduction of the 2-way (un)signed dot-product insn [1][2]
in AArch64 SVE2, the existing direct opcode approach is no longer
sufficient for full specification of all the possible dot product
machine instructions to be matched to the code sequence; a dot product
resulting in VNx4SI may result from either dot products on VNx16QI or
VNx8HI values for the 4- and 2-way dot product operations, respectively.
This means that the following example fails autovectorization:
uint32_t foo(int n, uint16_t* data) {
uint32_t sum = 0;
for (int i=0; i<n; i+=1) {
sum += data[i] * data[i];
}
return sum;
}
To remedy the issue a new optab is added, tentatively named
`udot_prod_twoway_optab', whose selection is dependent upon checking
of both input and output types involved in the operation.
I don't like this too much. I'll note we document dot_prod as
@cindex @code{sdot_prod@var{m}} instruction pattern
@item @samp{sdot_prod@var{m}}
Compute the sum of the products of two signed elements.
Operand 1 and operand 2 are of the same mode. Their
product, which is of a wider mode, is computed and added to operand 3.
Operand 3 is of a mode equal or wider than the mode of the product. The
result is placed in operand 0, which is of the same mode as operand 3.
@var{m} is the mode of operand 1 and operand 2.
with no restriction on the wider mode but we don't specify it which is
bad design. This should have been a convert optab with two modes
from the start - adding a _twoway variant is just a hack.
We did discuss this at the time we started implementing it. There was two
options, one was indeed to change it to a convert dot_prod optab, but doing
this means we have to update every target that uses it.
Now that means 3 ISAs for AArch64, Arm, Arc, c6x, 2 for x86, loongson and
altivec.
Which sure could be possible, but there's also every use in the backends that
need
to be updated, and tested, which for some targets we don't even know how to
begin.
So it seems very hard to correct dotprod to a convert optab now.
It's still the correct way to go. At _least_ your new pattern should
have been this,
otherwise what do you do when you have two-way, four-way and eight-way
variants?
Add yet another optab?
I guess that's fair, but having the new optab only be convert resulted in messy
code as everywhere you must check for both variants.
Additionally that optab would then overlap with the existing optabs as, as you
Say, the documentation only says it's of a wider type and doesn't indicate
precision.
So to avoid issues down the line then If the new optab isn't acceptable then
we'll have to do a wholesale conversion then..
Yep. It shouldn't be difficult though.
Another thing is that when you do it your way you should fix the existing optab
to be two-way by documenting how the second mode derives from the first.
And sure, it's not the only optab suffering from this issue.
Sure, all the zero and sign extending optabs for instance 😊
But for example the scalar ones are correct:
OPTAB_CL(sext_optab, "extend$b$a2", SIGN_EXTEND, "extend",
gen_extend_conv_libfunc)
Richard.
Tamar
Richard.
Tamar
Richard.
In order to minimize changes to the existing codebase,
`optab_for_tree_code' is renamed `optab_for_tree_code_1' and a new
argument is added to its signature - `const_tree otype', allowing type
information to be specified for both input and output types. The
existing nterface is retained by defining a new `optab_for_tree_code',
which serves as a shim to `optab_for_tree_code_1', passing old
parameters as-is and setting the new `optype' argument to `NULL_TREE'.
For DOT_PROD_EXPR tree codes, we can call `optab_for_tree_code_1'
directly, passing it both types, adding the internal logic to the
function to distinguish between competing optabs.
Finally, necessary changes are made to `expand_widen_pattern_expr' to
ensure the new icode can be correctly selected, given the new optab.
[1] https://developer.arm.com/documentation/ddi0602/2024-03/SVE-
Instructions/UDOT--2-way--vectors---Unsigned-integer-dot-product-
[2] https://developer.arm.com/documentation/ddi0602/2024-03/SVE-
Instructions/SDOT--2-way--vectors---Signed-integer-dot-product-
gcc/ChangeLog:
* config/aarch64/aarch64-sve2.md
(@aarch64_sve_<sur>dotvnx4sivnx8hi):
renamed to `<sur>dot_prod_twoway_vnx8hi'.
* config/aarch64/aarch64-sve-builtins-base.cc (svdot_impl.expand):
update icodes used in line with above rename.
* optabs-tree.cc (optab_for_tree_code_1): Renamed
`optab_for_tree_code' and added new argument.
(optab_for_tree_code): Now a call to `optab_for_tree_code_1'.
* optabs-tree.h (optab_for_tree_code_1): New.
* optabs.cc (expand_widen_pattern_expr): Expand support for
DOT_PROD_EXPR patterns.
* optabs.def (udot_prod_twoway_optab): New.
(sdot_prod_twoway_optab): Likewise.
* tree-vect-patterns.cc (vect_supportable_direct_optab_p): Add
support for misc optabs that use two modes.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/vect-dotprod-twoway.c: New.
---
.../aarch64/aarch64-sve-builtins-base.cc | 4 ++--
gcc/config/aarch64/aarch64-sve2.md | 2 +-
gcc/optabs-tree.cc | 23 ++++++++++++++++--
gcc/optabs-tree.h | 2 ++
gcc/optabs.cc | 2 +-
gcc/optabs.def | 2 ++
.../gcc.dg/vect/vect-dotprod-twoway.c | 24 +++++++++++++++++++
gcc/tree-vect-patterns.cc | 2 +-
8 files changed, 54 insertions(+), 7 deletions(-)
create mode 100644 gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
index 0d2edf3f19e..e457db09f66 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
@@ -764,8 +764,8 @@ public:
icode = (e.type_suffix (0).float_p
? CODE_FOR_aarch64_sve_fdotvnx4sfvnx8hf
: e.type_suffix (0).unsigned_p
- ? CODE_FOR_aarch64_sve_udotvnx4sivnx8hi
- : CODE_FOR_aarch64_sve_sdotvnx4sivnx8hi);
+ ? CODE_FOR_udot_prod_twoway_vnx8hi
+ : CODE_FOR_sdot_prod_twoway_vnx8hi);
return e.use_unpred_insn (icode);
}
};
diff --git a/gcc/config/aarch64/aarch64-sve2.md
b/gcc/config/aarch64/aarch64-sve2.md
index 934e57055d3..5677de7108d 100644
--- a/gcc/config/aarch64/aarch64-sve2.md
+++ b/gcc/config/aarch64/aarch64-sve2.md
@@ -2021,7 +2021,7 @@ (define_insn
"@aarch64_sve_qsub_<sve_int_op>_lane_<mode>"
)
;; Two-way dot-product.
-(define_insn "@aarch64_sve_<sur>dotvnx4sivnx8hi"
+(define_insn "<sur>dot_prod_twoway_vnx8hi"
[(set (match_operand:VNx4SI 0 "register_operand")
(plus:VNx4SI
(unspec:VNx4SI
diff --git a/gcc/optabs-tree.cc b/gcc/optabs-tree.cc
index b69a5bc3676..e3c5a618ea2 100644
--- a/gcc/optabs-tree.cc
+++ b/gcc/optabs-tree.cc
@@ -35,8 +35,8 @@ along with GCC; see the file COPYING3. If not see
cannot give complete results for multiplication or division) but probably
ought to be relied on more widely throughout the expander. */
optab
-optab_for_tree_code (enum tree_code code, const_tree type,
- enum optab_subtype subtype)
+optab_for_tree_code_1 (enum tree_code code, const_tree type,
+ const_tree otype, enum optab_subtype subtype)
{
bool trapv;
switch (code)
@@ -149,6 +149,14 @@ optab_for_tree_code (enum tree_code code,
const_tree type,
case DOT_PROD_EXPR:
{
+ if (otype && (TYPE_PRECISION (TREE_TYPE (type)) * 2
+ == TYPE_PRECISION (TREE_TYPE (otype))))
+ {
+ if (TYPE_UNSIGNED (type) && TYPE_UNSIGNED (otype))
+ return udot_prod_twoway_optab;
+ if (!TYPE_UNSIGNED (type) && !TYPE_UNSIGNED (otype))
+ return sdot_prod_twoway_optab;
+ }
if (subtype == optab_vector_mixed_sign)
return usdot_prod_optab;
@@ -285,6 +293,17 @@ optab_for_tree_code (enum tree_code code,
const_tree type,
}
}
+/* Return the optab used for computing the operation given by the tree
code,
+ CODE and the tree EXP. This function is not always usable (for example, it
+ cannot give complete results for multiplication or division) but probably
+ ought to be relied on more widely throughout the expander. */
+optab
+optab_for_tree_code (enum tree_code code, const_tree type,
+ enum optab_subtype subtype)
+{
+ return optab_for_tree_code_1 (code, type, NULL_TREE, subtype);
+}
+
/* Check whether an operation represented by CODE is a 'half' widening
operation
in which the input vector type has half the number of bits of the output
vector type e.g. V8QI->V8HI.
diff --git a/gcc/optabs-tree.h b/gcc/optabs-tree.h
index f2b49991462..13ff7ca2e4b 100644
--- a/gcc/optabs-tree.h
+++ b/gcc/optabs-tree.h
@@ -36,6 +36,8 @@ enum optab_subtype
/* Return the optab used for computing the given operation on the type
given
by
the second argument. The third argument distinguishes between the types
of
vector shifts and rotates. */
+optab optab_for_tree_code_1 (enum tree_code, const_tree, const_tree
__attribute__((unused)),
+ enum optab_subtype );
optab optab_for_tree_code (enum tree_code, const_tree, enum
optab_subtype);
bool
supportable_half_widening_operation (enum tree_code, tree, tree,
diff --git a/gcc/optabs.cc b/gcc/optabs.cc
index ce91f94ed43..3a1c6c7b90e 100644
--- a/gcc/optabs.cc
+++ b/gcc/optabs.cc
@@ -311,7 +311,7 @@ expand_widen_pattern_expr (sepops ops, rtx op0,
rtx
op1, rtx wide_op,
gcc_unreachable ();
widen_pattern_optab
- = optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), subtype);
+ = optab_for_tree_code_1 (ops->code, TREE_TYPE (oprnd0), TREE_TYPE
(oprnd2), subtype);
}
else
widen_pattern_optab
diff --git a/gcc/optabs.def b/gcc/optabs.def
index ad14f9328b9..cf1a6e7a7dc 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -408,6 +408,8 @@ OPTAB_D (sdot_prod_optab, "sdot_prod$I$a")
OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3")
OPTAB_D (udot_prod_optab, "udot_prod$I$a")
OPTAB_D (usdot_prod_optab, "usdot_prod$I$a")
+OPTAB_D (sdot_prod_twoway_optab, "sdot_prod_twoway_$I$a")
+OPTAB_D (udot_prod_twoway_optab, "udot_prod_twoway_$I$a")
OPTAB_D (usum_widen_optab, "widen_usum$I$a3")
OPTAB_D (usad_optab, "usad$I$a")
OPTAB_D (ssad_optab, "ssad$I$a")
diff --git a/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
b/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
new file mode 100644
index 00000000000..cba2aadbec8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
@@ -0,0 +1,24 @@
+/* { dg-do compile { target { aarch64*-*-* } } } */
+/* { dg-additional-options "-march=-O3 -march=armv9.2-a+sme2 -fdump-
tree-
vect-details" { target { aarch64*-*-* } } } */
+
+#include <stdint.h>
+
+uint32_t udot2(int n, uint16_t* data) {
+ uint32_t sum = 0;
+ for (int i=0; i<n; i+=1) {
+ sum += data[i] * data[i];
+ }
+ return sum;
+}
+
+int32_t sdot2(int n, int16_t* data) {
+ int32_t sum = 0;
+ for (int i=0; i<n; i+=1) {
+ sum += data[i] * data[i];
+ }
+ return sum;
+}
+
+/* { dg-final { scan-tree-dump-times "vect_recog_dot_prod_pattern:
detected:"
2 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern:
detected" 4 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" } } */
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index dfb7d800526..0760f25d94d 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -233,7 +233,7 @@ vect_supportable_direct_optab_p (vec_info *vinfo,
tree
otype, tree_code code,
if (!vecotype)
return false;
- optab optab = optab_for_tree_code (code, vecitype, subtype);
+ optab optab = optab_for_tree_code_1 (code, vecitype, vecotype, subtype);
if (!optab)
return false;
--
2.34.1