Richard Sandiford <richard.sandif...@linaro.org> writes:
> Michael Matz <m...@suse.de> writes:
>> On Wed, 6 Jul 2011, Richard Sandiford wrote:
>>> If so, then:
>>> 
>>> (a) That doesn't happen at the tree level.  The subtraction is still inside
>>>     the loop at RTL generation time.
>>> 
>>> (b) What's the advantage of introducing a new hoisted subtraction that
>>>     is going to be live throughout the loop, and then adding another IV
>>>     to it inside the loop, over using the original IV and incrementing it
>>>     in the normal way?
>>
>> It can reduce address complexity for one of the addresses.  E.g. given:
>>
>>  i=0; i < end; i+=4 
>>    p[i];
>>    q[i];
>>
>> -->
>>
>>  n=p; n < p+end; n+=4
>>    [n];
>>    (q-p)[n];
>>
>> Here (q-p) is loop-invariant, and the complexity of the first address is 
>> lower (no offset).  In fact the register pressure is lower by one too 
>> (three instead of four, including the end/p+end bound).
>
> But your second loop isn't what I was comparing it with.  I was comparing
> it with:
>
> n=p; n < p+end; n+=4, m+=4
>   [n]
>   [m]
>
> That has the same number of registers (3) and the same number of
> additions (2).  And the [m] is what we started with, so it was
> actually:
>
>   i=0; i<count; i+=1, n+=4, m+=4
>     [n]
>     [m]
>
> -->
>
>   i=0; i<count; i+=1, n+=4
>     [n]
>     (q-p)[n]
>
> (we don't get rid of "i" or "count" in this case.
>
> If the target allows (q-p)[n] to be used directly as an address, and if
> the target has no post-increment instruction, then it might be better.
> But I think it's a loss on other targets.  It might even be a loss on
> targets (like PowerPC IIRC), that need base+index addresses to have
> the "real" base first.  This sort of transformation seems to make us
> lose track of which register is the base.

Actually, I take that back.  Use of (p-q)[n] (hoisted_diff[n]) as an
address is precisely the case in which we've decided we _don't_ want to
apply the optimisation (see the address_p code I quoted).  So I'm not
sure when it's a win even on targets like x86.

Richard

Reply via email to