On Tue, 5 Mar 2013 10:42:59 +0100
Richard Biener <richard.guent...@gmail.com> wrote:

> On Tue, Mar 5, 2013 at 12:47 AM, Paul Brook <p...@codesourcery.com>
> wrote:
> >> I somehow missed the "Appendix A: Support for Advanced SIMD
> >> Extensions" in the AAPCS document (it's not in the TOC!). It looks
> >> like the builtin vector types are indeed defined to be stored in
> >> memory in vldm/vstm order -- I think that means we're back to
> >> square one.
> >
> > There's still the possibility of making gcc "generic" vector types
> > different from the ABI specified types[1], but that feels like it's
> > probably a really bad idea.
> >
> > Having a distinct set of types just for the vectorizer may be a
> > more viable option. IIRC the type selection hooks are more flexible
> > than when we first looked at this problem.
> >
> > Paul
> >
> > [1] e.g. int gcc __attribute__((vector_size(8)));  v.s. int32x2_t
> > eabi;
> 
> I think int32x2_t should not be a GCC vector type (thus not have a
> vector mode). The ABI specified types should map to an integer mode
> of the right size instead.  The vectorizer would then still use
> internal GCC vector types and modes and the backend needs to provide
> instruction patterns that do the right thing with the element
> ordering the vectorizer expects.
> 
> How are the int32x2_t types used?  I suppose they are arguments to
> the intrinsics.  Which means that for _most_ operations element order
> does not matter, thus a plus32x2 (int32x2_t x, int32x2_t y) can simply
> use the equivalent of return (int32x2_t)((gcc_int32x2_t)x +
> (gcc_int32x2_t)y). In intrinsics where order matters you'd insert
> appropriate __builtin_shuffle()s.

Maybe there's no need to interpret the vector layout for any of the
intrinsics -- just treat all inputs & outputs as opaque (there are
intrinsics for getting/setting lanes -- IMO these shouldn't attempt to
convert lane numbers at all, though they do at present). Several
intrinsics are currently implemented using __builtin_shuffle, e.g.:

__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
vrev64_s8 (int8x8_t __a)
{
  return (int8x8_t) __builtin_shuffle (__a, (uint8x8_t) { 7, 6, 5, 4, 3, 2, 1, 
0 });
}

I'd imagine that if int8x8_t are not actual vector types, we could
invent extra builtins to convert them to and from such types to be able
to still do this kind of thing (in arm_neon.h, not necessarily for
direct use by users), i.e.:

typedef char gcc_int8x8_t __attribute__((vector_size(8)));

int8x8_t
vrev64_s8 (int8x8_t __a)
{
  gcc_int8x8_t tmp = __builtin_neon2generic (__a);
  tmp = __builtin_shuffle (tmp, (gcc_int8x8_t) { 7, 6, 5, 4, ... });
  return __builtin_generic2neon (tmp);
}

(On re-reading, that's basically the same as what you suggested, I
think.)

> Oh, of course do the above only for big-endian mode ...
> 
> The other way around, mapping intrinsics and ABI vectors to vector
> modes will have issues ... you'd have to guard all optab queries in
> the middle-end to fail for arm big-endian as they expect instruction
> patterns that deal with the GCC vector ordering.
> 
> Thus: model the backend after GCCs expectations and "fixup" the rest
> by fixing the ABI types and intrinsics.

I think this plan will work fine -- it has the added advantage (which
looks like a disadvantage, but really isn't) that generic vector
operations like:

void foo (void)
{
  int8x8_t x = { 0, 1, 2, 3, 4, 5, 6, 7 };
}

will *not* work -- nor will e.g. subscripting ABI-defined vectors using
[]s. At the moment using these features can lead to surprising results.

Unfortunately NEON's pretty complicated, and the ARM backend currently
uses vector modes quite heavily implementing it, so just using integer
modes for intrinsics is going to be tough. It might work to create a
shadow set of vector modes for use only by the intrinsics (O*mode for
"opaque" instead of V*mode, say), if the middle end won't barf at that.

Thanks,

Julian

Reply via email to