Richard Guenther <richard.guent...@gmail.com> writes: > Simple. Just make them registers anyway (I did that in the past > when working on middle-end arrays). You'd set DECL_GIMPLE_REG_P > on the decl.
OK, thanks, I'll give that a go. TBH, I'm still hopeful we can do without it, because we do seem to cope quite well as things stand. But I suppose that might not hold true as the examples get more complicated. > 4. a vector-of-vectors type > > Cons > * I don't think we want that ;) Yeah :-) >> __builtin_load_lanes (REF : array N*M of X) >> returns array N of vector M of X >> maps to vldN on ARM >> in practice, the result would be used in assignments of the form: >> vectorY = ARRAY_REF <result, Y> >> >> __builtin_store_lanes (VECTORS : array N of vector M of X) >> returns array N*M of X >> maps to vstN on ARM >> in practice, the argument would be populated by assignments of the form: >> ARRAY_REF <VECTORS, Y> = vectorY >> >> __builtin_load_lane (REF : array N of X, >> VECTORS : array N of vector M of X, >> LANE : integer) >> returns array N of vector M of X >> maps to vldN_lane on ARM >> >> __builtin_store_lane (VECTORS : array N of vector M of X, >> LANE : integer) >> returns array N of X >> maps to vstN_lane on ARM >> >> __builtin_load_dup (REF : array N of X) >> returns array N of vector M of X >> maps to vldN_dup on ARM >> >> I've hacked up a prototype of this and it seems to produce good code. >> What do you think? > > How do you expect these to be used? That is, would you ever expect > components of those large vectors/arrays be used in operations > like add, or does the HW provide vector-lane variants for those as well? The individual vectors would be used for add, etc. That's what the ARRAY_REF stuff above is supposed to be getting at. So... > Thus, will > > for (i=0; i<N; ++i) > X[i] = Y[i] + Z[i]; > > result in a single add per vector lane load or a single vector lane load > for M "unrolled" instances of (small) vector adds? If the latter then > we have to think about indexing the vector lanes as well as allowing > partial stores (or have a vector-lane construct operation). Representing > vector lanes as automatic memory (with array of vector type) makes > things easy, but eventually not very efficient. ...Ira would know best, but I don't think it would be used for this kind of loop. It would be more something like: for (i=0; i<N; ++i) X[i] = Y[i].red + Y[i].blue + Y[i].green; (not a realistic example). You'd then have: compoundY = __builtin_load_lanes (Y); red = ARRAY_REF <compoundY, 0> green = ARRAY_REF <compoundY, 1> blue = ARRAY_REF <compoundY, 2> D1 = red + green D2 = D1 + blue MEM_REF <X> = D2; My understanding is that'd we never do any operations besides ARRAY_REFs on the compound value, and that the individual vectors would be treated pretty much like any other. > I had new tree/stmt codes for array loads/stores for middle-end arrays. > Eventually the vector lane support can at least walk in the same direction > that middle-end arrays would ;) What's the status of the middle-end array stuff? A quick search showed up your paper, but is it still WIP, or has it already gone in? (Showing my ignorance of tree-level stuff here. :-)) It does sound like it'd be a good fit for these ops. Richard