From: Victor Do Nascimento <[email protected]>
Given the novel treatment of the dot product optab as a conversion we
are now able to target, for a given architecture, different
relationships between output modes and input modes.
This is made clearer by way of example. Previously, on AArch64, the
following loop was vectorizable:
uint32_t udot4(int n, uint8_t* data) {
uint32_t sum = 0;
for (int i=0; i<n; i+=1)
sum += data[i] * data[i];
return sum;
}
while the following wasn't:
uint32_t udot2(int n, uint16_t* data) {
uint32_t sum = 0;
for (int i=0; i<n; i+=1)
sum += data[i] * data[i];
return sum;
}
Under the new treatment of the dot product optab, they are both now
vectorizable.
This adds the relevant target-agnostic check to ensure this behaviour
in the autovectorizer.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/vect-dotprod-twoway.c: New.
---
.../gcc.dg/vect/vect-dotprod-twoway.c | 38 +++++++++++++++++++
1 file changed, 38 insertions(+)
create mode 100644 gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
diff --git a/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
b/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
new file mode 100644
index 00000000000..5caa7b81fce
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_int } */
+/* Ensure both the two-way and four-way dot products are autovectorized. */
+#include <stdint.h>
+
+uint32_t udot4(int n, uint8_t* data) {
+ uint32_t sum = 0;
+ for (int i=0; i<n; i+=1) {
+ sum += data[i] * data[i];
+ }
+ return sum;
+}
+
+int32_t sdot4(int n, int8_t* data) {
+ int32_t sum = 0;
+ for (int i=0; i<n; i+=1) {
+ sum += data[i] * data[i];
+ }
+ return sum;
+}
+
+uint32_t udot2(int n, uint16_t* data) {
+ uint32_t sum = 0;
+ for (int i=0; i<n; i+=1) {
+ sum += data[i] * data[i];
+ }
+ return sum;
+}
+
+int32_t sdot2(int n, int16_t* data) {
+ int32_t sum = 0;
+ for (int i=0; i<n; i+=1) {
+ sum += data[i] * data[i];
+ }
+ return sum;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 4 "vect" } } */
--
2.34.1