Richard Sandiford <richard.sandif...@linaro.org> writes: > Michael Matz <m...@suse.de> writes: >> On Wed, 6 Jul 2011, Richard Sandiford wrote: >>> If so, then: >>> >>> (a) That doesn't happen at the tree level. The subtraction is still inside >>> the loop at RTL generation time. >>> >>> (b) What's the advantage of introducing a new hoisted subtraction that >>> is going to be live throughout the loop, and then adding another IV >>> to it inside the loop, over using the original IV and incrementing it >>> in the normal way? >> >> It can reduce address complexity for one of the addresses. E.g. given: >> >> i=0; i < end; i+=4 >> p[i]; >> q[i]; >> >> --> >> >> n=p; n < p+end; n+=4 >> [n]; >> (q-p)[n]; >> >> Here (q-p) is loop-invariant, and the complexity of the first address is >> lower (no offset). In fact the register pressure is lower by one too >> (three instead of four, including the end/p+end bound). > > But your second loop isn't what I was comparing it with. I was comparing > it with: > > n=p; n < p+end; n+=4, m+=4 > [n] > [m] > > That has the same number of registers (3) and the same number of > additions (2). And the [m] is what we started with, so it was > actually: > > i=0; i<count; i+=1, n+=4, m+=4 > [n] > [m] > > --> > > i=0; i<count; i+=1, n+=4 > [n] > (q-p)[n] > > (we don't get rid of "i" or "count" in this case. > > If the target allows (q-p)[n] to be used directly as an address, and if > the target has no post-increment instruction, then it might be better. > But I think it's a loss on other targets. It might even be a loss on > targets (like PowerPC IIRC), that need base+index addresses to have > the "real" base first. This sort of transformation seems to make us > lose track of which register is the base.
Actually, I take that back. Use of (p-q)[n] (hoisted_diff[n]) as an address is precisely the case in which we've decided we _don't_ want to apply the optimisation (see the address_p code I quoted). So I'm not sure when it's a win even on targets like x86. Richard