On Thu, Jul 11, 2024 at 9:03 AM Tamar Christina <tamar.christ...@arm.com> wrote: > > Hi Victor, > > > -----Original Message----- > > From: Victor Do Nascimento <victor.donascime...@arm.com> > > Sent: Wednesday, July 10, 2024 3:06 PM > > To: gcc-patches@gcc.gnu.org > > Cc: Richard Sandiford <richard.sandif...@arm.com>; Richard Earnshaw > > <richard.earns...@arm.com>; Victor Do Nascimento > > <vicdo...@e125768.arm.com> > > Subject: [PATCH 10/10] autovectorizer: Test autovectorization of different > > dot- > > prod modes. > > > > From: Victor Do Nascimento <vicdo...@e125768.arm.com> > > > > Given the novel treatment of the dot product optab as a conversion we > > are now able to target, for a given architecture, different > > relationships between output modes and input modes. > > > > This is made clearer by way of example. Previously, on AArch64, the > > following loop was vectorizable: > > > > uint32_t udot4(int n, uint8_t* data) { > > uint32_t sum = 0; > > for (int i=0; i<n; i+=1) > > sum += data[i] * data[i]; > > return sum; > > } > > > > while the following wasn't: > > > > uint32_t udot2(int n, uint16_t* data) { > > uint32_t sum = 0; > > for (int i=0; i<n; i+=1) > > sum += data[i] * data[i]; > > return sum; > > } > > > > Under the new treatment of the dot product optab, they are both now > > vectorizable. > > > > This adds the relevant target-agnostic check to ensure this behaviour > > in the autovectorizer. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.dg/vect/vect-dotprod-twoway.c: New. > > --- > > .../gcc.dg/vect/vect-dotprod-twoway.c | 38 +++++++++++++++++++ > > 1 file changed, 38 insertions(+) > > create mode 100644 gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c > > b/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c > > new file mode 100644 > > index 00000000000..5caa7b81fce > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c > > @@ -0,0 +1,38 @@ > > +/* { dg-do compile } */ > > +/* { dg-require-effective-target vect_int } */ > > +/* Ensure both the two-way and four-way dot products are autovectorized. > > */ > > +#include <stdint.h> > > + > > +uint32_t udot4(int n, uint8_t* data) { > > + uint32_t sum = 0; > > + for (int i=0; i<n; i+=1) { > > + sum += data[i] * data[i]; > > + } > > + return sum; > > +} > > + > > +int32_t sdot4(int n, int8_t* data) { > > + int32_t sum = 0; > > + for (int i=0; i<n; i+=1) { > > + sum += data[i] * data[i]; > > + } > > + return sum; > > +} > > + > > +uint32_t udot2(int n, uint16_t* data) { > > + uint32_t sum = 0; > > + for (int i=0; i<n; i+=1) { > > + sum += data[i] * data[i]; > > + } > > + return sum; > > +} > > + > > +int32_t sdot2(int n, int16_t* data) { > > + int32_t sum = 0; > > + for (int i=0; i<n; i+=1) { > > + sum += data[i] * data[i]; > > + } > > + return sum; > > +} > > + > > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 4 "vect" } } */ > > These tests only test that you have vectorized the loops, not that the loop > was vectorized > using dotprod. I think you want to have a scan for DOT_PROD_EXPR as well, > gated to the > targets that support two-way dot prod.
Ideally they'd also verify correctness, thus make them have runtime checks. > Cheers, > Tamar > > > -- > > 2.34.1 >