From: Victor Do Nascimento <vicdo...@e125768.arm.com>

Given the novel treatment of the dot product optab as a conversion we
are now able to target, for a given architecture, different
relationships between output modes and input modes.

This is made clearer by way of example. Previously, on AArch64, the
following loop was vectorizable:

uint32_t udot4(int n, uint8_t* data) {
  uint32_t sum = 0;
  for (int i=0; i<n; i+=1)
    sum += data[i] * data[i];
  return sum;
}

while the following wasn't:

uint32_t udot2(int n, uint16_t* data) {
  uint32_t sum = 0;
  for (int i=0; i<n; i+=1)
    sum += data[i] * data[i];
  return sum;
}

Under the new treatment of the dot product optab, they are both now
vectorizable.

This adds the relevant target-agnostic check to ensure this behaviour
in the autovectorizer.

gcc/testsuite/ChangeLog:

          * gcc.dg/vect/vect-dotprod-twoway.c: New.
---
 .../gcc.dg/vect/vect-dotprod-twoway.c         | 38 +++++++++++++++++++
 1 file changed, 38 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c

diff --git a/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c 
b/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
new file mode 100644
index 00000000000..5caa7b81fce
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_int } */
+/* Ensure both the two-way and four-way dot products are autovectorized.  */
+#include <stdint.h>
+
+uint32_t udot4(int n, uint8_t* data) {
+  uint32_t sum = 0;
+  for (int i=0; i<n; i+=1) {
+    sum += data[i] * data[i];
+  }
+  return sum;
+}
+
+int32_t sdot4(int n, int8_t* data) {
+  int32_t sum = 0;
+  for (int i=0; i<n; i+=1) {
+    sum += data[i] * data[i];
+  }
+  return sum;
+}
+
+uint32_t udot2(int n, uint16_t* data) {
+  uint32_t sum = 0;
+  for (int i=0; i<n; i+=1) {
+    sum += data[i] * data[i];
+  }
+  return sum;
+}
+
+int32_t sdot2(int n, int16_t* data) {
+  int32_t sum = 0;
+  for (int i=0; i<n; i+=1) {
+    sum += data[i] * data[i];
+  }
+  return sum;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 4 "vect" } } */
-- 
2.34.1

Reply via email to