Christophe Lyon <christophe.l...@linaro.org> writes:
> This patch adds support for auto-vectorization of average value
> computation using vhadd or vrhadd, for both MVE and Neon.
>
> The patch adds the needed [u]avg<mode>3_[floor|ceil] patterns to
> vec-common.md, I'm not sure how to factorize them without introducing
> an unspec iterator?

Yeah, an int iterator would be one way, but I'm not sure it would
make things better given the differences in how Neon and MVE handle
their unspecs.

> It also adds tests for 'floor' and for 'ceil', each for MVE and Neon.
>
> Vectorization works with 8-bit and 16 bit input/output vectors, but
> not with 32-bit ones because the vectorizer expects wider types
> availability for the intermediate values, but int32_t + int32_t does
> not involve wider types in the IR.

Right.  Like you say, it's only valid to use V(R)HADD if, in the source
code, the addition and shift have a wider precision than the operands.
That happens naturally for 8-bit and 16-bit operands, since C arithmetic
promotes them to "int" first.  But for 32-bit operands, the C code needs
to do the addition and shift in 64 bits.  Doing them in 64 bits should
be fine for narrower operands too.

So:

> diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vhadd-1.c 
> b/gcc/testsuite/gcc.target/arm/simd/mve-vhadd-1.c
> new file mode 100644
> index 00000000000..40489ecc67d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/simd/mve-vhadd-1.c
> @@ -0,0 +1,31 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target arm_v8_1m_mve_ok } */
> +/* { dg-add-options arm_v8_1m_mve } */
> +/* { dg-additional-options "-O3" } */
> +
> +#include <stdint.h>
> +
> +#define FUNC(SIGN, TYPE, BITS, OP, NAME)                             \
> +  void test_ ## NAME ##_ ## SIGN ## BITS (TYPE##BITS##_t * __restrict__ 
> dest, \
> +                                       TYPE##BITS##_t *a, TYPE##BITS##_t *b) 
> { \
> +    int i;                                                           \
> +    for (i=0; i < (128 / BITS); i++) {                                       
> \
> +      dest[i] = (a[i] OP b[i]) >> 1;                                 \
> +    }                                                                        
> \
> +}
> +

…it should work if you make this "((int64_t) a[i] OP b[i]) >> 1".

> As noted in neon-vhadd-1.c, I couldn't write a test able to use Neon
> vectorization with 64-bit vectors: we default to
> -mvectorize-with-neon-quad, and attempts to use
> -mvectorize-with-neon-double resulted in much worse code, which this
> patch does not aim at improving.

I guess this is because the MVE_2 mode iterators only include 128-bit types.
Leaving Neon double as future work sounds good though.

And yeah, the code for V(R)HADD-equivalent operations is much worse when
V(R)HADD isn't available, since the compiler really does need to double
the precision of the operands, do double-precision addition,
do double-precision shifts, and then truncate back.  So this looks
like the expected behaviour.

Thanks,
Richard

Reply via email to