On 12/14/2017 12:26 PM, Richard Sandiford wrote: >>> How does it relate to what LRA can do? AFAIK LRA doesn't try to find >>> any global optimal solution and previous hardreg assignments may work >>> against it? > > Yeah, both of those are problems. But the more important problem is > that it can't increase the live ranges of input registers as easily. > Doing it before RA means that IRA gets to see the new ranges. LRA does not work on a global basis. It's somewhere between basic block and extended basic block in its scope.
Remat (along with caller-saves) is really just a case of range splitting in my mind. So you really want the pass to either directly integrate with IRA or run prior to IRA. You can do splitting in response to failure to get a hard register and try to hook back into IRA to color those new objects. I had reasonably good success with that approach when I was looking at the allocators prior to LRA. Basically I let IRA do its thing. WHen it was done I walked through the IL splitting ranges to make hard registers available at key points. Then I'd call back into IRA (using existing mechanisms) to try allocation again for the allocnos that had not been colored and any new ones. The key was there was some very simple and easy range splitting you could do on already allocated allocnos that in turn would free up hard registers. That kind of model doesn't seem to fit here terribly well. It's not a lack of hard regs that's the problem, but simply not having any hard regs available across calls. So splitting the range of some allocno that did get a hard register isn't going to help color any of the allocanos that did not get a register. If we go back further (circa 1998) we did a pre-allocation range splitting pass. We had it working marginally OK, but never really as well as we wanted. In that model we looked at pseudos that were likely going to be hard to allocate and split them into multiple new pseudos. We tracked the relationship between the new and original pseudo so that reload could shove them back together in cases where that made the most sense. We had copyin/copyout insns to move back and forth between the range copies and the original pseudo as needed. I don't remember the heuristics that drove when/where to split. Meissner might since my recollection is that he did the major lifting there. But again, I don't think that model works here either. It did nothing WRT remat. I know we pondered remat in the context of revamping caller-saves in the early 90s to help Sparc FP. But my recollection was that once we had caller-saves handling the basics well, the performance gains were enough that digging into remat was never really explored. Anyway, that's a bit of history. IMHO remat has to run prior to allocation or integrated with allocation. In general I'd expect running before and independent of IRA to be easier to implement, but slightly less performant than tightly integrated with IRA. In addition to potentially avoiding spilling, we have an added benefit for SVE that we avoid variable sized stack frames if we can eliminate *all* instances of SVE regsiters live across calls. I'm guessing that they're relatively rare to begin with based on comments within the actual code. > >>> That said - I would have expected remat to be done before the >>> first scheduling pass? Even before pass_sms (not sure >>> what pass_live_range_shrinkage does). Or be integrated >>> with scheduling and it's register pressure cost model. > > SMS shouldn't be a problem. Early remat wouldn't introduce new > instructions into a loop unless the loop also had a call, which would > prevent SMS. And although it's theoretically possible that it could > remove instructions from a loop, that would only happen if: > > (a) the instruction actually computes the same value every time, so > could have been moved outside the loop; and > > (b) the result is only used after a following call (and in particular > isn't used within the loop itself) > > (a) is a missed optimisation and (b) seems unlikely. > > Integrating remat into scheduling would make it much less powerful, > since scheduling does only limited code motion between blocks. > > Doing it before scheduling would be good in principle, but there > would then need to be a fake dependency between the call and remat > instructions to stop the scheduler moving the remat instructions > back before the call. Adding early remat was a way of avoiding such > fake dependencies in "every" pass, but it might be that scheduling > is one case in which the dependencies make sense. > > Either way, being able to run the pass before scheduling seems > like a future enhancement, blocked on a future enhancement to > the scheduler. I'd expect it to be post-scheduling simply because otherwise you have to ensure scheduling doesn't muck it back up. And there's been talk t > >>> Also I would have expected the approach to apply to all modes, >>> just the cost of spilling is different. But if you can, say, reduce >>> register pressure by one by rematerializing a bit-not then that >>> should be always profitable, no? postreload-cse will come to >>> the rescue anyhow. > > But that would then mean taking the register pressure into account when > deciding whether to rematerialise. On the one hand would make it hard > to do before scheduling (which decides the final pre-RA pressure). > It would also make it a significantly different algorithm, since it > wouldn't be a standard availability problem any more. RIght. In general you don't want to remat or range split speculatively -- otherwise you end up doing a lot of work later to pull things back together. Been there, done that. Estimation of pressure and costing models are very important here as the scope of things under consideration is great and the possibility of makings worse is very real. > > For that use case, pressure-dependent remat in the scheduler might > be a better approach, like you were suggesting. The early remat > pass is specifically for the extreme case of no registers being > call-preserved, where it's more important that we don't miss > remat opportunities, and more important that we treat it as > a global problem. This case feels different, largely becuase we're talking about a fairly uncommon case and the cost if we hit that case is very high. ie, we've got a larger amount of wiggle room if we don't get the costing model terribly accurate. jeff