On Wed, May 23, 2018 at 11:19 AM, Richard Biener <richard.guent...@gmail.com> wrote: > On Tue, May 22, 2018 at 2:11 PM Richard Sandiford < > richard.sandif...@linaro.org> wrote: > >> Richard Biener <richard.guent...@gmail.com> writes: >> > On Mon, May 21, 2018 at 3:14 PM Bin Cheng <bin.ch...@arm.com> wrote: >> > >> >> Hi, >> >> As reported in PR85804, bump step is wrongly computed for vector(1) > load >> > of >> >> single-element group access. This patch fixes the issue by correcting >> > bump >> >> step computation for the specific VMAT_CONTIGUOUS case. >> > >> >> Bootstrap and test on x86_64 and AArch64 ongoing, is it OK? >> > >> > To me it looks like the classification as VMAT_CONTIGUOUS is bogus. >> > We'd fall into the grouped_load case otherwise which should handle >> > the situation correctly? >> > >> > Richard? > >> Yeah, I agree. I mentioned to Bin privately that that was probably >> a misstep and that we should instead continue to treat them as >> VMAT_CONTIGUOUS_PERMUTE, but simply select the required vector >> from the array of loaded vectors, instead of doing an actual permute. > > I'd classify them as VMAT_ELEMENTWISE instead. CONTIGUOUS > should be only for no-gap vectors. How do we classify single-element > interleaving? That would be another classification choice. Yes, I suggested this offline too, but Richard may have more to say about this. One thing worth noting is classifying it as VMAT_ELEMENTWISE would disable vectorization in this case because of cost model issue as commented at the end of get_load_store_type.
Thanks, bin > >> (Note that VMAT_CONTIGUOUS is OK for stores, since we don't allow >> gaps there. But it might be easiest to handle both loads and stores >> in the same way.) > >> Although it still seems weird to "vectorise" stuff to one element. >> Why not leave the original scalar code in place, and put the onus on >> whatever wants to produce or consume a V1 to do the appropriate >> conversion? > > Yeah, V1 is somewhat awkward to deal with but I think the way we're > doing it right now is OK. Note if we leave the original scalar code in > place we still have to deal with larger unrolling factors thus we'd have > to duplicate the scalar code and have different IVs anyways. > > Richard. > >> Thanks, >> Richard