Richard Guenther <[email protected]> writes:
> Simple. Just make them registers anyway (I did that in the past
> when working on middle-end arrays). You'd set DECL_GIMPLE_REG_P
> on the decl.
OK, thanks, I'll give that a go. TBH, I'm still hopeful we can
do without it, because we do seem to cope quite well as things stand.
But I suppose that might not hold true as the examples get more complicated.
> 4. a vector-of-vectors type
>
> Cons
> * I don't think we want that ;)
Yeah :-)
>> __builtin_load_lanes (REF : array N*M of X)
>> returns array N of vector M of X
>> maps to vldN on ARM
>> in practice, the result would be used in assignments of the form:
>> vectorY = ARRAY_REF <result, Y>
>>
>> __builtin_store_lanes (VECTORS : array N of vector M of X)
>> returns array N*M of X
>> maps to vstN on ARM
>> in practice, the argument would be populated by assignments of the form:
>> ARRAY_REF <VECTORS, Y> = vectorY
>>
>> __builtin_load_lane (REF : array N of X,
>> VECTORS : array N of vector M of X,
>> LANE : integer)
>> returns array N of vector M of X
>> maps to vldN_lane on ARM
>>
>> __builtin_store_lane (VECTORS : array N of vector M of X,
>> LANE : integer)
>> returns array N of X
>> maps to vstN_lane on ARM
>>
>> __builtin_load_dup (REF : array N of X)
>> returns array N of vector M of X
>> maps to vldN_dup on ARM
>>
>> I've hacked up a prototype of this and it seems to produce good code.
>> What do you think?
>
> How do you expect these to be used? That is, would you ever expect
> components of those large vectors/arrays be used in operations
> like add, or does the HW provide vector-lane variants for those as well?
The individual vectors would be used for add, etc. That's what the
ARRAY_REF stuff above is supposed to be getting at. So...
> Thus, will
>
> for (i=0; i<N; ++i)
> X[i] = Y[i] + Z[i];
>
> result in a single add per vector lane load or a single vector lane load
> for M "unrolled" instances of (small) vector adds? If the latter then
> we have to think about indexing the vector lanes as well as allowing
> partial stores (or have a vector-lane construct operation). Representing
> vector lanes as automatic memory (with array of vector type) makes
> things easy, but eventually not very efficient.
...Ira would know best, but I don't think it would be used for this
kind of loop. It would be more something like:
for (i=0; i<N; ++i)
X[i] = Y[i].red + Y[i].blue + Y[i].green;
(not a realistic example). You'd then have:
compoundY = __builtin_load_lanes (Y);
red = ARRAY_REF <compoundY, 0>
green = ARRAY_REF <compoundY, 1>
blue = ARRAY_REF <compoundY, 2>
D1 = red + green
D2 = D1 + blue
MEM_REF <X> = D2;
My understanding is that'd we never do any operations besides ARRAY_REFs
on the compound value, and that the individual vectors would be treated
pretty much like any other.
> I had new tree/stmt codes for array loads/stores for middle-end arrays.
> Eventually the vector lane support can at least walk in the same direction
> that middle-end arrays would ;)
What's the status of the middle-end array stuff? A quick search
showed up your paper, but is it still WIP, or has it already gone in?
(Showing my ignorance of tree-level stuff here. :-)) It does sound
like it'd be a good fit for these ops.
Richard