On Fri, Aug 26, 2016 at 05:03:34PM +0200, Bernd Schmidt wrote: > On 08/26/2016 04:50 PM, Segher Boessenkool wrote: > >The head comment starts with > > > >+/* Separate shrink-wrapping > >+ > >+ Instead of putting all of the prologue and epilogue in one spot, we > >+ can put parts of it in places where those components are executed less > >+ frequently. > > > >and that is the long and short of it. > > And that comment puzzles me. Surely prologue and epilogue are executed > only once currently, so how does frequency come into it? Again - please > provide an example.
If some component is only needed for 0.01% of executions of a function, running it once for every execution is 10000 times too much. The trivial example is a function that does an early exit, but uses one or a few non-volatile registers before that exit. This happens in e.g. glibc's malloc, if you want an easily accessed example. With the current code, *all* components will be saved and then restored shortly afterwards. > >The full-prologue algorithm makes as many blocks run without prologue as > >possible, by duplicating blocks where that helps. If you do this for > >every component you can and up with 2**40 blocks for just 40 components, > > Ok, so why wouldn't we use the existing code with the duplication part > disabled? That would not perform nearly as well. > That's a later addition anyway and isn't necessary to do > shrink-wrapping in the first place. No, it always did that, just not as often (it only duplicated straight-line code before). Segher