On Tue, Jan 09, 2018 at 12:23:42PM +0000, Wilco Dijkstra wrote:
> Segher Boessenkool wrote:
> > On Mon, Jan 08, 2018 at 0:25:47PM +0000, Wilco Dijkstra wrote:
> >> > Always pairing two registers together *also* degrades code quality.
> >> 
> >> No, while it's not optimal, it means smaller code and fewer memory 
> >> accesses.
> >
> > It means you execute *more* memory accesses.  Always.  This may be
> > sometimes hidden, sure.  I'm not saying you do not want more ldp's;
> > I'm saying this particular strategy is very far from ideal.
> 
> No it means less since the number of memory accesses reduces (memory
> bandwidth may increase but that's not an issue).

The problem is *more* memory accesses are executed at runtime.  Which is
why separate shrink-wrapping does what it does: to have *fewer* executed.
(It's not just the direct execution cost why that helps: more important
are latencies to dependent ops, microarchitectural traps, etc.).

If you make A always stored whenever B is, and the other way around, the
optimal place to do it will always store at least as often as either A
or B, _but can also store more often than either_.

> >> That may well be the problem. So if there are N predecessors, of which N-1
> >> need to restore the same set of callee saves, but one was shrinkwrapped,
> >> N-1 copies of the same restores might be emitted. N could be the number
> >> of blocks in a function - I really hope it doesn't work out like that...
> >
> > In the worst case it would.  OTOH, joining every combo into blocks costs
> > O(2**C) (where C is the # components) bb's worst case.
> >
> > It isn't a simple problem.  The current tuning works pretty well for us,
> > but no doubt it can be improved!
> 
> Well if there are C components, we could limit the total number of 
> saves/restores
> inserted to say 4C. Similarly common cases could easily share the restores
> without increasing the number of branches.

It is common to see many saves/restores generated for the exceptional cases.


Segher

Reply via email to