Richard Biener <richard.guent...@gmail.com> writes: > On December 14, 2017 8:26:49 PM GMT+01:00, Richard Sandiford > <richard.sandif...@linaro.org> wrote: >>Jeff Law <l...@redhat.com> writes: >>> On 12/14/2017 04:09 AM, Richard Biener wrote: >>>> On Fri, Nov 17, 2017 at 4:58 PM, Richard Sandiford >>>> <richard.sandif...@linaro.org> wrote: >>>>> This patch looks for pseudo registers that are live across a call >>>>> and for which no call-preserved hard registers exist. It then >>>>> recomputes the pseudos as necessary to ensure that they are no >>>>> longer live across a call. The comment at the head of the file >>>>> describes the approach. >>>>> >>>>> A new target hook selects which modes should be treated in this >>way. >>>>> By default none are, in which case the pass is skipped very early. >>>>> >>>>> It might also be worth looking for cases like: >>>>> >>>>> C1: R1 := f (...) >>>>> ... >>>>> C2: R2 := f (...) >>>>> C3: R1 := C2 >>>>> >>>>> and giving the same value number to C1 and C3, effectively treating >>>>> it like: >>>>> >>>>> C1: R1 := f (...) >>>>> ... >>>>> C2: R2 := f (...) >>>>> C3: R1 := f (...) >>>>> >>>>> Another (much more expensive) enhancement would be to apply value >>>>> numbering to all pseudo registers (not just rematerialisation >>>>> candidates), so that we can handle things like: >>>>> >>>>> C1: R1 := f (...R2...) >>>>> ... >>>>> C2: R1 := f (...R3...) >>>>> >>>>> where R2 and R3 hold the same value. But the current pass seems >>>>> to catch the vast majority of cases. >>>>> >>>>> Tested on aarch64-linux-gnu (with and without SVE), >>x86_64-linux-gnu >>>>> and powerpc64le-linux-gnu. OK to install? >>>> >>>> Can you tell anything about the complexity of the algorithm? >> >>Have to get back to you on that one. :-) >> >>>> How does it relate to what LRA can do? AFAIK LRA doesn't try to >>find >>>> any global optimal solution and previous hardreg assignments may >>work >>>> against it? >> >>Yeah, both of those are problems. But the more important problem is >>that it can't increase the live ranges of input registers as easily. >>Doing it before RA means that IRA gets to see the new ranges. >> >>>> That said - I would have expected remat to be done before the >>>> first scheduling pass? Even before pass_sms (not sure >>>> what pass_live_range_shrinkage does). Or be integrated >>>> with scheduling and it's register pressure cost model. >> >>SMS shouldn't be a problem. Early remat wouldn't introduce new >>instructions into a loop unless the loop also had a call, which would >>prevent SMS. And although it's theoretically possible that it could >>remove instructions from a loop, that would only happen if: >> >> (a) the instruction actually computes the same value every time, so >> could have been moved outside the loop; and >> >> (b) the result is only used after a following call (and in particular >> isn't used within the loop itself) >> >>(a) is a missed optimisation and (b) seems unlikely. >> >>Integrating remat into scheduling would make it much less powerful, >>since scheduling does only limited code motion between blocks. >> >>Doing it before scheduling would be good in principle, but there >>would then need to be a fake dependency between the call and remat >>instructions to stop the scheduler moving the remat instructions >>back before the call. Adding early remat was a way of avoiding such >>fake dependencies in "every" pass, but it might be that scheduling >>is one case in which the dependencies make sense. >> >>Either way, being able to run the pass before scheduling seems >>like a future enhancement, blocked on a future enhancement to >>the scheduler. >> >>>> Also I would have expected the approach to apply to all modes, >>>> just the cost of spilling is different. But if you can, say, reduce >>>> register pressure by one by rematerializing a bit-not then that >>>> should be always profitable, no? postreload-cse will come to >>>> the rescue anyhow. >> >>But that would then mean taking the register pressure into account when >>deciding whether to rematerialise. On the one hand would make it hard >>to do before scheduling (which decides the final pre-RA pressure). >>It would also make it a significantly different algorithm, since it >>wouldn't be a standard availability problem any more. >> >>For that use case, pressure-dependent remat in the scheduler might >>be a better approach, like you were suggesting. The early remat >>pass is specifically for the extreme case of no registers being >>call-preserved, where it's more important that we don't miss >>remat opportunities, and more important that we treat it as >>a global problem. > > On x86_64 all xmm registers are caller saved for example. That means all > FP regs and all vectors. (yeah, stupid ABI decision....)
OK. The patch uses a target hook to select the modes -- basing it off whether they're variable-length is just the default. So if this turns out to be a win for x86_64 or for SPARC (was originally going to reply to that in Jeff's message, sorry), then the target could opt in if it wants to. Thanks, Richard