On Fri, 1 Mar 2013 11:07:17 +0100
Richard Biener <richard.guent...@gmail.com> wrote:

> On Wed, Feb 27, 2013 at 6:29 PM, Julian Brown
> <jul...@codesourcery.com> wrote:
> > Hi,
> >
> > Several new (ish?) autovectorizer features have apparently caused
> > NEON support for same to regress quite heavily in big-endian mode.
> > This patch is an attempt to fix things up, but is not without
> > problems -- maybe someone will have a suggestion as to how we
> > should proceed.
> >
> > The problem (as ever) is that the ARM backend must lie to the
> > middle-end about the layout of NEON vectors in big-endian mode (due
> > to ABI requirements, VFP compatibility, and the middle-end
> > semantics of vector indices being equivalent to those of an array
> > with the same type of elements when stored in memory).
> 
> Why not simply give up?  Thus, make autovectorization unsupported for
> ARM big-endian targets?

That's certainly a tempting option...

> Do I understand correctly that the "only" issue is memory vs. register
> element ordering?  Thus a fixup could be as simple as extra shuffles
> inserted after vector memory loads and before vector memory stores?
> (with the hope of RTL optimizers optimizing those)?

It's not even necessary to use explicit shuffles -- NEON has perfectly
good instructions for loading/storing vectors in the "right" order, in
the form of vld1 & vst1. I'm afraid the solution to this problem might
have been staring us in the face for years, which is simply to forbid
vldr/vstr/vldm/vstm (the instructions which lead to weird element
permutations in BE mode) for loading/storing NEON vectors altogether.
That way the vectorizer gets what it wants, the intrinsics can continue
to use __builtin_shuffle exactly as they are doing, and we get to
remove all the bits which fiddle vector element numbering in BE mode in
the ARM backend.

I can't exactly remember why we didn't do that to start with. I think
the problem was ABI-related, or to do with transferring NEON vectors
to/from ARM registers when it was necessary to do that... I'm planning
to do some archaeology to try to see if I can figure out a definitive
answer.

(Previous discussions include, e.g.:

http://gcc.gnu.org/ml/gcc-patches/2009-11/msg00876.html

http://gcc.gnu.org/ml/gcc-patches/2010-06/msg00409.html

http://lists.linaro.org/pipermail/linaro-toolchain/2010-November/000437.html

it looks like ABI boundaries require vldr/vstr/vldm/vstm ordering:
maybe those can be treated as "opaque" transfers and continue to use
the same instructions & ordering, but vld1/vst1 can be used everywhere
else?)

> Any "lies" are of course bad and you'll pay for them later.

Indeed :-).

Cheers,

Julian

Reply via email to