On Mon, 4 Mar 2013 13:08:57 +0000 Paul Brook <p...@codesourcery.com> wrote:
> > > > I can't exactly remember why we didn't do that to start with. I > > > > think the problem was ABI-related, or to do with transferring > > > > NEON vectors to/from ARM registers when it was necessary to do > > > > that... I'm planning to do some archaeology to try to see if I > > > > can figure out a definitive answer. > > > > > > The ABI defined vector types (uint32x4_t etc) are defined to be in > > > vldm/vstm order. > > > > There's no conflict with the ABI-defined vector order -- the ABI > > (looking at AAPCS, IHI 0042D) describes "containerized" vectors > > which should be used to pass and return vector quantities at ABI > > boundaries, but I couldn't find any further restrictions. > > Internally to a function, we are still free to use vld1/vst1 vector > > ordering. Using "containerized"/opaque transfers, the bit pattern > > of a vector in one function (using vld1/vst1 ordering internally) > > will of course remain unchanged if passed to another function and > > using the same ordering there also. > > Ah, ok. If you make the ABI defined types distinct from the GCC > generic vector types (as used by the vectorizer), then in principle > that should work. I agree that current GCC probably does not have the > infrastructure to do that, and some of the vector code plays a bit > fast and loose with type conversions/subregs. (Subregs use memory ordering for the byte offset, so I think those are OK if we use array-order loads/stores pervasively. I'm not 100% sure though...) > Remember that it's not just function arguments, it's any interface > shared between functions. i.e. including structures and global > variables. Ugh, I hadn't considered structures or global variables :-/. If we decide they have to use the containerized format also, then we lose a lot of the supposed advantage of using array-format vectors "everywhere" (apart from at procedure call boundaries), for instance if we want code with a global variable like: union { char myarr[8]; v8qi myvec; } foo; to do the right thing (i.e., with elements of myvec corresponding one-to-one to elements of myarr), then using the containerized format for accesses to myvec would be a non-starter. Skimming the AAPCS, I'm not sure it actually specifies anything about the layout of global variables which may be shared between functions (it'd make sense to do so -- maybe it's elsewhere in the EABI documents). Aggregates passed by value could also be marshalled/unmarshalled like vectors, though that starts to sound much less tractable than dealing with vectors alone. > > Actually making that work (especially efficiently) with GCC is a > > slightly different matter. Let's call vldm/vstm-ordered vectors > > "containerized" format, and vld1/vst1-ordered vectors "array" > > format. We need to do introduce the concept of marshalling vector > > arguments from array format to containerized format when passing > > them to a function, and unmarshalling those vector arguments back > > the other way on function entry. AFAICT, GCC does not have suitable > > infrastructure for implementing such functionality at present: > > consider that e.g. vectors passed by value on the stack should use > > containerized format, which means the called function cannot simply > > dereference the stack pointer to read the vector: > > IIRC I/we tried to do something very similar (possibly the other way > around) by abusing the unaligned load mechanism. I don't remember > why that failed. That'd be this conversation: http://gcc.gnu.org/ml/gcc-patches/2009-11/msg00876.html we only tweaked the vectorizer to always use movmisalign, leaving intrinsics & generic vectors using vldm/vstm order. Fixing-up the resulting chaos using ad-hoc hacks didn't go down too well with maintainers, so the patch fizzled out. Cheers, Julian