On Wed, 9 Jun 2021 at 17:39, Richard Sandiford
<richard.sandif...@arm.com> wrote:
>
> Christophe Lyon <christophe.l...@linaro.org> writes:
> > This patch adds support for auto-vectorization of average value
> > computation using vhadd or vrhadd, for both MVE and Neon.
> >
> > The patch adds the needed [u]avg<mode>3_[floor|ceil] patterns to
> > vec-common.md, I'm not sure how to factorize them without introducing
> > an unspec iterator?
> >
> > It also adds tests for 'floor' and for 'ceil', each for MVE and Neon.
> >
> > Vectorization works with 8-bit and 16 bit input/output vectors, but
> > not with 32-bit ones because the vectorizer expects wider types
> > availability for the intermediate values, but int32_t + int32_t does
> > not involve wider types in the IR.
> >
> > The testcases use a cast to int64_t to workaround this and enable
> > vectorization with vectors of 32-bit elements.
>
> We're probably in violent agreement here and just phrasing it
> differently :-), but IMO this isn't a workaround.  It would be actively
> wrong to use V(R)HADD for the (u)int32_t version without the cast:
> V(R)HADD effectively computes a 33-bit addition result, instead of
> the 32-bit addition result in the original (cast-free) source code.
>
> I guess one could argue that overflow is undefined for int32_t + int32_t
> and so we could compute the full 33-bit result instead of using modulo
> arithmetic.  That might be too surprising though.  It also wouldn't
> help with uint32_t, since we do need to use modulo arithmetic for
> uint32_t + uint32_t.

Right. It would have been clearer if I had written different functions
for the different types, only needing the cast for the int32_t.
I used "workaround" because the int64_t cast looks strange with int8_t
and int16_t.

> So personally I think it would be better to drop the last two paragraphs.
>
> The patch LGTM though, thanks.
>
OK, thanks.

> Richard

Reply via email to