On Tue, Mar 22, 2011 at 8:43 PM, Richard Sandiford <rdsandif...@googlemail.com> wrote: > Richard Guenther <richard.guent...@gmail.com> writes: >> Simple. Just make them registers anyway (I did that in the past >> when working on middle-end arrays). You'd set DECL_GIMPLE_REG_P >> on the decl. > > OK, thanks, I'll give that a go. TBH, I'm still hopeful we can > do without it, because we do seem to cope quite well as things stand. > But I suppose that might not hold true as the examples get more complicated. > >> 4. a vector-of-vectors type >> >> Cons >> * I don't think we want that ;) > > Yeah :-) > >>> __builtin_load_lanes (REF : array N*M of X) >>> returns array N of vector M of X >>> maps to vldN on ARM >>> in practice, the result would be used in assignments of the form: >>> vectorY = ARRAY_REF <result, Y> >>> >>> __builtin_store_lanes (VECTORS : array N of vector M of X) >>> returns array N*M of X >>> maps to vstN on ARM >>> in practice, the argument would be populated by assignments of the >>> form: >>> ARRAY_REF <VECTORS, Y> = vectorY >>> >>> __builtin_load_lane (REF : array N of X, >>> VECTORS : array N of vector M of X, >>> LANE : integer) >>> returns array N of vector M of X >>> maps to vldN_lane on ARM >>> >>> __builtin_store_lane (VECTORS : array N of vector M of X, >>> LANE : integer) >>> returns array N of X >>> maps to vstN_lane on ARM >>> >>> __builtin_load_dup (REF : array N of X) >>> returns array N of vector M of X >>> maps to vldN_dup on ARM >>> >>> I've hacked up a prototype of this and it seems to produce good code. >>> What do you think? >> >> How do you expect these to be used? That is, would you ever expect >> components of those large vectors/arrays be used in operations >> like add, or does the HW provide vector-lane variants for those as well? > > The individual vectors would be used for add, etc. That's what the > ARRAY_REF stuff above is supposed to be getting at. So... > >> Thus, will >> >> for (i=0; i<N; ++i) >> X[i] = Y[i] + Z[i]; >> >> result in a single add per vector lane load or a single vector lane load >> for M "unrolled" instances of (small) vector adds? If the latter then >> we have to think about indexing the vector lanes as well as allowing >> partial stores (or have a vector-lane construct operation). Representing >> vector lanes as automatic memory (with array of vector type) makes >> things easy, but eventually not very efficient. > > ...Ira would know best, but I don't think it would be used for this > kind of loop. It would be more something like: > > for (i=0; i<N; ++i) > X[i] = Y[i].red + Y[i].blue + Y[i].green; > > (not a realistic example). You'd then have: > > compoundY = __builtin_load_lanes (Y); > red = ARRAY_REF <compoundY, 0> > green = ARRAY_REF <compoundY, 1> > blue = ARRAY_REF <compoundY, 2> > D1 = red + green > D2 = D1 + blue > MEM_REF <X> = D2; > > My understanding is that'd we never do any operations besides ARRAY_REFs > on the compound value, and that the individual vectors would be treated > pretty much like any other.
Ok, I thought it might be used to have a larger vectorization factor for loads and stores, basically make further unrolling cheaper because you don't have to duplicate the loads and stores. >> I had new tree/stmt codes for array loads/stores for middle-end arrays. >> Eventually the vector lane support can at least walk in the same direction >> that middle-end arrays would ;) > > What's the status of the middle-end array stuff? A quick search > showed up your paper, but is it still WIP, or has it already gone in? > (Showing my ignorance of tree-level stuff here. :-)) It does sound > like it'd be a good fit for these ops. Well, the work is basically suspended (though a lot of middle-end surgery that was required went in) - I was stuck on the necessity to have the Fortran frontend generate these expressions to have testing on real code (rather than constructing examples from my lame C frontend + builtins hack). ISTR porting the patch to tuples, the current patch seems to have two or three places that adjust the middle-end in order to allow aggregate typed SSA names. But as you have partial defs of the vector lane array the simplest approach is probably to not make them a register. Be prepared for some surprises during RTL expansion though ;) Richard. > Richard >