Christophe Lyon <christophe.l...@linaro.org> writes: > This patch adds support for auto-vectorization of average value > computation using vhadd or vrhadd, for both MVE and Neon. > > The patch adds the needed [u]avg<mode>3_[floor|ceil] patterns to > vec-common.md, I'm not sure how to factorize them without introducing > an unspec iterator?
Yeah, an int iterator would be one way, but I'm not sure it would make things better given the differences in how Neon and MVE handle their unspecs. > It also adds tests for 'floor' and for 'ceil', each for MVE and Neon. > > Vectorization works with 8-bit and 16 bit input/output vectors, but > not with 32-bit ones because the vectorizer expects wider types > availability for the intermediate values, but int32_t + int32_t does > not involve wider types in the IR. Right. Like you say, it's only valid to use V(R)HADD if, in the source code, the addition and shift have a wider precision than the operands. That happens naturally for 8-bit and 16-bit operands, since C arithmetic promotes them to "int" first. But for 32-bit operands, the C code needs to do the addition and shift in 64 bits. Doing them in 64 bits should be fine for narrower operands too. So: > diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vhadd-1.c > b/gcc/testsuite/gcc.target/arm/simd/mve-vhadd-1.c > new file mode 100644 > index 00000000000..40489ecc67d > --- /dev/null > +++ b/gcc/testsuite/gcc.target/arm/simd/mve-vhadd-1.c > @@ -0,0 +1,31 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target arm_v8_1m_mve_ok } */ > +/* { dg-add-options arm_v8_1m_mve } */ > +/* { dg-additional-options "-O3" } */ > + > +#include <stdint.h> > + > +#define FUNC(SIGN, TYPE, BITS, OP, NAME) \ > + void test_ ## NAME ##_ ## SIGN ## BITS (TYPE##BITS##_t * __restrict__ > dest, \ > + TYPE##BITS##_t *a, TYPE##BITS##_t *b) > { \ > + int i; \ > + for (i=0; i < (128 / BITS); i++) { > \ > + dest[i] = (a[i] OP b[i]) >> 1; \ > + } > \ > +} > + …it should work if you make this "((int64_t) a[i] OP b[i]) >> 1". > As noted in neon-vhadd-1.c, I couldn't write a test able to use Neon > vectorization with 64-bit vectors: we default to > -mvectorize-with-neon-quad, and attempts to use > -mvectorize-with-neon-double resulted in much worse code, which this > patch does not aim at improving. I guess this is because the MVE_2 mode iterators only include 128-bit types. Leaving Neon double as future work sounds good though. And yeah, the code for V(R)HADD-equivalent operations is much worse when V(R)HADD isn't available, since the compiler really does need to double the precision of the operands, do double-precision addition, do double-precision shifts, and then truncate back. So this looks like the expected behaviour. Thanks, Richard