On Fri, 1 Mar 2013 11:07:17 +0100 Richard Biener <richard.guent...@gmail.com> wrote:
> On Wed, Feb 27, 2013 at 6:29 PM, Julian Brown > <jul...@codesourcery.com> wrote: > > Hi, > > > > Several new (ish?) autovectorizer features have apparently caused > > NEON support for same to regress quite heavily in big-endian mode. > > This patch is an attempt to fix things up, but is not without > > problems -- maybe someone will have a suggestion as to how we > > should proceed. > > > > The problem (as ever) is that the ARM backend must lie to the > > middle-end about the layout of NEON vectors in big-endian mode (due > > to ABI requirements, VFP compatibility, and the middle-end > > semantics of vector indices being equivalent to those of an array > > with the same type of elements when stored in memory). > > Why not simply give up? Thus, make autovectorization unsupported for > ARM big-endian targets? That's certainly a tempting option... > Do I understand correctly that the "only" issue is memory vs. register > element ordering? Thus a fixup could be as simple as extra shuffles > inserted after vector memory loads and before vector memory stores? > (with the hope of RTL optimizers optimizing those)? It's not even necessary to use explicit shuffles -- NEON has perfectly good instructions for loading/storing vectors in the "right" order, in the form of vld1 & vst1. I'm afraid the solution to this problem might have been staring us in the face for years, which is simply to forbid vldr/vstr/vldm/vstm (the instructions which lead to weird element permutations in BE mode) for loading/storing NEON vectors altogether. That way the vectorizer gets what it wants, the intrinsics can continue to use __builtin_shuffle exactly as they are doing, and we get to remove all the bits which fiddle vector element numbering in BE mode in the ARM backend. I can't exactly remember why we didn't do that to start with. I think the problem was ABI-related, or to do with transferring NEON vectors to/from ARM registers when it was necessary to do that... I'm planning to do some archaeology to try to see if I can figure out a definitive answer. (Previous discussions include, e.g.: http://gcc.gnu.org/ml/gcc-patches/2009-11/msg00876.html http://gcc.gnu.org/ml/gcc-patches/2010-06/msg00409.html http://lists.linaro.org/pipermail/linaro-toolchain/2010-November/000437.html it looks like ABI boundaries require vldr/vstr/vldm/vstm ordering: maybe those can be treated as "opaque" transfers and continue to use the same instructions & ordering, but vld1/vst1 can be used everywhere else?) > Any "lies" are of course bad and you'll pay for them later. Indeed :-). Cheers, Julian