On December 14, 2017 8:26:49 PM GMT+01:00, Richard Sandiford <richard.sandif...@linaro.org> wrote: >Jeff Law <l...@redhat.com> writes: >> On 12/14/2017 04:09 AM, Richard Biener wrote: >>> On Fri, Nov 17, 2017 at 4:58 PM, Richard Sandiford >>> <richard.sandif...@linaro.org> wrote: >>>> This patch looks for pseudo registers that are live across a call >>>> and for which no call-preserved hard registers exist. It then >>>> recomputes the pseudos as necessary to ensure that they are no >>>> longer live across a call. The comment at the head of the file >>>> describes the approach. >>>> >>>> A new target hook selects which modes should be treated in this >way. >>>> By default none are, in which case the pass is skipped very early. >>>> >>>> It might also be worth looking for cases like: >>>> >>>> C1: R1 := f (...) >>>> ... >>>> C2: R2 := f (...) >>>> C3: R1 := C2 >>>> >>>> and giving the same value number to C1 and C3, effectively treating >>>> it like: >>>> >>>> C1: R1 := f (...) >>>> ... >>>> C2: R2 := f (...) >>>> C3: R1 := f (...) >>>> >>>> Another (much more expensive) enhancement would be to apply value >>>> numbering to all pseudo registers (not just rematerialisation >>>> candidates), so that we can handle things like: >>>> >>>> C1: R1 := f (...R2...) >>>> ... >>>> C2: R1 := f (...R3...) >>>> >>>> where R2 and R3 hold the same value. But the current pass seems >>>> to catch the vast majority of cases. >>>> >>>> Tested on aarch64-linux-gnu (with and without SVE), >x86_64-linux-gnu >>>> and powerpc64le-linux-gnu. OK to install? >>> >>> Can you tell anything about the complexity of the algorithm? > >Have to get back to you on that one. :-) > >>> How does it relate to what LRA can do? AFAIK LRA doesn't try to >find >>> any global optimal solution and previous hardreg assignments may >work >>> against it? > >Yeah, both of those are problems. But the more important problem is >that it can't increase the live ranges of input registers as easily. >Doing it before RA means that IRA gets to see the new ranges. > >>> That said - I would have expected remat to be done before the >>> first scheduling pass? Even before pass_sms (not sure >>> what pass_live_range_shrinkage does). Or be integrated >>> with scheduling and it's register pressure cost model. > >SMS shouldn't be a problem. Early remat wouldn't introduce new >instructions into a loop unless the loop also had a call, which would >prevent SMS. And although it's theoretically possible that it could >remove instructions from a loop, that would only happen if: > > (a) the instruction actually computes the same value every time, so > could have been moved outside the loop; and > > (b) the result is only used after a following call (and in particular > isn't used within the loop itself) > >(a) is a missed optimisation and (b) seems unlikely. > >Integrating remat into scheduling would make it much less powerful, >since scheduling does only limited code motion between blocks. > >Doing it before scheduling would be good in principle, but there >would then need to be a fake dependency between the call and remat >instructions to stop the scheduler moving the remat instructions >back before the call. Adding early remat was a way of avoiding such >fake dependencies in "every" pass, but it might be that scheduling >is one case in which the dependencies make sense. > >Either way, being able to run the pass before scheduling seems >like a future enhancement, blocked on a future enhancement to >the scheduler. > >>> Also I would have expected the approach to apply to all modes, >>> just the cost of spilling is different. But if you can, say, reduce >>> register pressure by one by rematerializing a bit-not then that >>> should be always profitable, no? postreload-cse will come to >>> the rescue anyhow. > >But that would then mean taking the register pressure into account when >deciding whether to rematerialise. On the one hand would make it hard >to do before scheduling (which decides the final pre-RA pressure). >It would also make it a significantly different algorithm, since it >wouldn't be a standard availability problem any more. > >For that use case, pressure-dependent remat in the scheduler might >be a better approach, like you were suggesting. The early remat >pass is specifically for the extreme case of no registers being >call-preserved, where it's more important that we don't miss >remat opportunities, and more important that we treat it as >a global problem.
On x86_64 all xmm registers are caller saved for example. That means all FP regs and all vectors. (yeah, stupid ABI decision....) Richard. >Thanks, >Richard