From: Victor Do Nascimento <vicdo...@e125768.arm.com> Given the novel treatment of the dot product optab as a conversion we are now able to target, for a given architecture, different relationships between output modes and input modes.
This is made clearer by way of example. Previously, on AArch64, the following loop was vectorizable: uint32_t udot4(int n, uint8_t* data) { uint32_t sum = 0; for (int i=0; i<n; i+=1) sum += data[i] * data[i]; return sum; } while the following wasn't: uint32_t udot2(int n, uint16_t* data) { uint32_t sum = 0; for (int i=0; i<n; i+=1) sum += data[i] * data[i]; return sum; } Under the new treatment of the dot product optab, they are both now vectorizable. This adds the relevant target-agnostic check to ensure this behaviour in the autovectorizer. gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-dotprod-twoway.c: New. --- .../gcc.dg/vect/vect-dotprod-twoway.c | 38 +++++++++++++++++++ 1 file changed, 38 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c diff --git a/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c b/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c new file mode 100644 index 00000000000..5caa7b81fce --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c @@ -0,0 +1,38 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_int } */ +/* Ensure both the two-way and four-way dot products are autovectorized. */ +#include <stdint.h> + +uint32_t udot4(int n, uint8_t* data) { + uint32_t sum = 0; + for (int i=0; i<n; i+=1) { + sum += data[i] * data[i]; + } + return sum; +} + +int32_t sdot4(int n, int8_t* data) { + int32_t sum = 0; + for (int i=0; i<n; i+=1) { + sum += data[i] * data[i]; + } + return sum; +} + +uint32_t udot2(int n, uint16_t* data) { + uint32_t sum = 0; + for (int i=0; i<n; i+=1) { + sum += data[i] * data[i]; + } + return sum; +} + +int32_t sdot2(int n, int16_t* data) { + int32_t sum = 0; + for (int i=0; i<n; i+=1) { + sum += data[i] * data[i]; + } + return sum; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 4 "vect" } } */ -- 2.34.1