Re: RFC: Representing vector lane load/store operations

Richard Sandiford Tue, 22 Mar 2011 12:43:36 -0700

Richard Guenther <richard.guent...@gmail.com> writes:
> Simple.  Just make them registers anyway (I did that in the past
> when working on middle-end arrays).  You'd set DECL_GIMPLE_REG_P
> on the decl.


OK, thanks, I'll give that a go.  TBH, I'm still hopeful we can
do without it, because we do seem to cope quite well as things stand.
But I suppose that might not hold true as the examples get more complicated.

>   4. a vector-of-vectors type
>
>      Cons
>         * I don't think we want that ;)

Yeah :-)

>>    __builtin_load_lanes (REF : array N*M of X)
>>      returns array N of vector M of X
>>      maps to vldN on ARM
>>      in practice, the result would be used in assignments of the form:
>>        vectorY = ARRAY_REF <result, Y>
>>
>>    __builtin_store_lanes (VECTORS : array N of vector M of X)
>>      returns array N*M of X
>>      maps to vstN on ARM
>>      in practice, the argument would be populated by assignments of the form:
>>        ARRAY_REF <VECTORS, Y> = vectorY
>>
>>    __builtin_load_lane (REF : array N of X,
>>                         VECTORS : array N of vector M of X,
>>                         LANE : integer)
>>      returns array N of vector M of X
>>      maps to vldN_lane on ARM
>>
>>    __builtin_store_lane (VECTORS : array N of vector M of X,
>>                          LANE : integer)
>>      returns array N of X
>>      maps to vstN_lane on ARM
>>
>>    __builtin_load_dup (REF : array N of X)
>>      returns array N of vector M of X
>>      maps to vldN_dup on ARM
>>
>> I've hacked up a prototype of this and it seems to produce good code.
>> What do you think?
>
> How do you expect these to be used?  That is, would you ever expect
> components of those large vectors/arrays be used in operations
> like add, or does the HW provide vector-lane variants for those as well?

The individual vectors would be used for add, etc.  That's what the
ARRAY_REF stuff above is supposed to be getting at.  So...

> Thus, will
>
>   for (i=0; i<N; ++i)
>     X[i] = Y[i] + Z[i];
>
> result in a single add per vector lane load or a single vector lane load
> for M "unrolled" instances of (small) vector adds?  If the latter then
> we have to think about indexing the vector lanes as well as allowing
> partial stores (or have a vector-lane construct operation).  Representing
> vector lanes as automatic memory (with array of vector type) makes
> things easy, but eventually not very efficient.

...Ira would know best, but I don't think it would be used for this
kind of loop.  It would be more something like:

   for (i=0; i<N; ++i)
     X[i] = Y[i].red + Y[i].blue + Y[i].green;
    
(not a realistic example).  You'd then have:

    compoundY = __builtin_load_lanes (Y);
    red = ARRAY_REF <compoundY, 0>
    green = ARRAY_REF <compoundY, 1>
    blue = ARRAY_REF <compoundY, 2>
    D1 = red + green
    D2 = D1 + blue
    MEM_REF <X> = D2;

My understanding is that'd we never do any operations besides ARRAY_REFs
on the compound value, and that the individual vectors would be treated
pretty much like any other.

> I had new tree/stmt codes for array loads/stores for middle-end arrays.
> Eventually the vector lane support can at least walk in the same direction
> that middle-end arrays would ;)

What's the status of the middle-end array stuff?  A quick search
showed up your paper, but is it still WIP, or has it already gone in?
(Showing my ignorance of tree-level stuff here. :-))  It does sound
like it'd be a good fit for these ops.

Richard

Re: RFC: Representing vector lane load/store operations

Reply via email to