On Wed, 9 Jun 2021 at 17:39, Richard Sandiford <richard.sandif...@arm.com> wrote: > > Christophe Lyon <christophe.l...@linaro.org> writes: > > This patch adds support for auto-vectorization of average value > > computation using vhadd or vrhadd, for both MVE and Neon. > > > > The patch adds the needed [u]avg<mode>3_[floor|ceil] patterns to > > vec-common.md, I'm not sure how to factorize them without introducing > > an unspec iterator? > > > > It also adds tests for 'floor' and for 'ceil', each for MVE and Neon. > > > > Vectorization works with 8-bit and 16 bit input/output vectors, but > > not with 32-bit ones because the vectorizer expects wider types > > availability for the intermediate values, but int32_t + int32_t does > > not involve wider types in the IR. > > > > The testcases use a cast to int64_t to workaround this and enable > > vectorization with vectors of 32-bit elements. > > We're probably in violent agreement here and just phrasing it > differently :-), but IMO this isn't a workaround. It would be actively > wrong to use V(R)HADD for the (u)int32_t version without the cast: > V(R)HADD effectively computes a 33-bit addition result, instead of > the 32-bit addition result in the original (cast-free) source code. > > I guess one could argue that overflow is undefined for int32_t + int32_t > and so we could compute the full 33-bit result instead of using modulo > arithmetic. That might be too surprising though. It also wouldn't > help with uint32_t, since we do need to use modulo arithmetic for > uint32_t + uint32_t.
Right. It would have been clearer if I had written different functions for the different types, only needing the cast for the int32_t. I used "workaround" because the int64_t cast looks strange with int8_t and int16_t. > So personally I think it would be better to drop the last two paragraphs. > > The patch LGTM though, thanks. > OK, thanks. > Richard