Re: RFC: Representing vector lane load/store operations

Richard Guenther Wed, 23 Mar 2011 02:23:26 -0700

On Tue, Mar 22, 2011 at 8:43 PM, Richard Sandiford
<rdsandif...@googlemail.com> wrote:
> Richard Guenther <richard.guent...@gmail.com> writes:
>> Simple.  Just make them registers anyway (I did that in the past
>> when working on middle-end arrays).  You'd set DECL_GIMPLE_REG_P
>> on the decl.
>
> OK, thanks, I'll give that a go.  TBH, I'm still hopeful we can
> do without it, because we do seem to cope quite well as things stand.
> But I suppose that might not hold true as the examples get more complicated.
>
>>   4. a vector-of-vectors type
>>
>>      Cons
>>         * I don't think we want that ;)
>
> Yeah :-)
>
>>>    __builtin_load_lanes (REF : array N*M of X)
>>>      returns array N of vector M of X
>>>      maps to vldN on ARM
>>>      in practice, the result would be used in assignments of the form:
>>>        vectorY = ARRAY_REF <result, Y>
>>>
>>>    __builtin_store_lanes (VECTORS : array N of vector M of X)
>>>      returns array N*M of X
>>>      maps to vstN on ARM
>>>      in practice, the argument would be populated by assignments of the 
>>> form:
>>>        ARRAY_REF <VECTORS, Y> = vectorY
>>>
>>>    __builtin_load_lane (REF : array N of X,
>>>                         VECTORS : array N of vector M of X,
>>>                         LANE : integer)
>>>      returns array N of vector M of X
>>>      maps to vldN_lane on ARM
>>>
>>>    __builtin_store_lane (VECTORS : array N of vector M of X,
>>>                          LANE : integer)
>>>      returns array N of X
>>>      maps to vstN_lane on ARM
>>>
>>>    __builtin_load_dup (REF : array N of X)
>>>      returns array N of vector M of X
>>>      maps to vldN_dup on ARM
>>>
>>> I've hacked up a prototype of this and it seems to produce good code.
>>> What do you think?
>>
>> How do you expect these to be used?  That is, would you ever expect
>> components of those large vectors/arrays be used in operations
>> like add, or does the HW provide vector-lane variants for those as well?
>
> The individual vectors would be used for add, etc.  That's what the
> ARRAY_REF stuff above is supposed to be getting at.  So...
>
>> Thus, will
>>
>>   for (i=0; i<N; ++i)
>>     X[i] = Y[i] + Z[i];
>>
>> result in a single add per vector lane load or a single vector lane load
>> for M "unrolled" instances of (small) vector adds?  If the latter then
>> we have to think about indexing the vector lanes as well as allowing
>> partial stores (or have a vector-lane construct operation).  Representing
>> vector lanes as automatic memory (with array of vector type) makes
>> things easy, but eventually not very efficient.
>
> ...Ira would know best, but I don't think it would be used for this
> kind of loop.  It would be more something like:
>
>   for (i=0; i<N; ++i)
>     X[i] = Y[i].red + Y[i].blue + Y[i].green;
>
> (not a realistic example).  You'd then have:
>
>    compoundY = __builtin_load_lanes (Y);
>    red = ARRAY_REF <compoundY, 0>
>    green = ARRAY_REF <compoundY, 1>
>    blue = ARRAY_REF <compoundY, 2>
>    D1 = red + green
>    D2 = D1 + blue
>    MEM_REF <X> = D2;
>
> My understanding is that'd we never do any operations besides ARRAY_REFs
> on the compound value, and that the individual vectors would be treated
> pretty much like any other.


Ok, I thought it might be used to have a larger vectorization factor for
loads and stores, basically make further unrolling cheaper because you
don't have to duplicate the loads and stores.

>> I had new tree/stmt codes for array loads/stores for middle-end arrays.
>> Eventually the vector lane support can at least walk in the same direction
>> that middle-end arrays would ;)
>
> What's the status of the middle-end array stuff?  A quick search
> showed up your paper, but is it still WIP, or has it already gone in?
> (Showing my ignorance of tree-level stuff here. :-))  It does sound
> like it'd be a good fit for these ops.

Well, the work is basically suspended (though a lot of middle-end
surgery that was required went in) - I was stuck on the necessity
to have the Fortran frontend generate these expressions to have
testing on real code (rather than constructing examples from my
lame C frontend + builtins hack).  ISTR porting the patch to tuples,
the current patch seems to have two or three places that adjust
the middle-end in order to allow aggregate typed SSA names.

But as you have partial defs of the vector lane array the simplest
approach is probably to not make them a register.  Be prepared
for some surprises during RTL expansion though ;)

Richard.

> Richard
>

Re: RFC: Representing vector lane load/store operations

Reply via email to