On Tue, Jan 09, 2018 at 12:23:42PM +0000, Wilco Dijkstra wrote: > Segher Boessenkool wrote: > > On Mon, Jan 08, 2018 at 0:25:47PM +0000, Wilco Dijkstra wrote: > >> > Always pairing two registers together *also* degrades code quality. > >> > >> No, while it's not optimal, it means smaller code and fewer memory > >> accesses. > > > > It means you execute *more* memory accesses. Always. This may be > > sometimes hidden, sure. I'm not saying you do not want more ldp's; > > I'm saying this particular strategy is very far from ideal. > > No it means less since the number of memory accesses reduces (memory > bandwidth may increase but that's not an issue).
The problem is *more* memory accesses are executed at runtime. Which is why separate shrink-wrapping does what it does: to have *fewer* executed. (It's not just the direct execution cost why that helps: more important are latencies to dependent ops, microarchitectural traps, etc.). If you make A always stored whenever B is, and the other way around, the optimal place to do it will always store at least as often as either A or B, _but can also store more often than either_. > >> That may well be the problem. So if there are N predecessors, of which N-1 > >> need to restore the same set of callee saves, but one was shrinkwrapped, > >> N-1 copies of the same restores might be emitted. N could be the number > >> of blocks in a function - I really hope it doesn't work out like that... > > > > In the worst case it would. OTOH, joining every combo into blocks costs > > O(2**C) (where C is the # components) bb's worst case. > > > > It isn't a simple problem. The current tuning works pretty well for us, > > but no doubt it can be improved! > > Well if there are C components, we could limit the total number of > saves/restores > inserted to say 4C. Similarly common cases could easily share the restores > without increasing the number of branches. It is common to see many saves/restores generated for the exceptional cases. Segher