Christophe Lyon <christophe.l...@linaro.org> writes:
> This patch adds support for auto-vectorization of average value
> computation using vhadd or vrhadd, for both MVE and Neon.
>
> The patch adds the needed [u]avg<mode>3_[floor|ceil] patterns to
> vec-common.md, I'm not sure how to factorize them without introducing
> an unspec iterator?
>
> It also adds tests for 'floor' and for 'ceil', each for MVE and Neon.
>
> Vectorization works with 8-bit and 16 bit input/output vectors, but
> not with 32-bit ones because the vectorizer expects wider types
> availability for the intermediate values, but int32_t + int32_t does
> not involve wider types in the IR.
>
> The testcases use a cast to int64_t to workaround this and enable
> vectorization with vectors of 32-bit elements.

We're probably in violent agreement here and just phrasing it
differently :-), but IMO this isn't a workaround.  It would be actively
wrong to use V(R)HADD for the (u)int32_t version without the cast:
V(R)HADD effectively computes a 33-bit addition result, instead of
the 32-bit addition result in the original (cast-free) source code.

I guess one could argue that overflow is undefined for int32_t + int32_t
and so we could compute the full 33-bit result instead of using modulo
arithmetic.  That might be too surprising though.  It also wouldn't
help with uint32_t, since we do need to use modulo arithmetic for
uint32_t + uint32_t.

So personally I think it would be better to drop the last two paragraphs.

The patch LGTM though, thanks.

Richard

Reply via email to