On Thu, Sep 08, 2016 at 10:58:13AM -0600, Jeff Law wrote: > >And that comment puzzles me. Surely prologue and epilogue are executed > >only once currently, so how does frequency come into it? Again - please > >provide an example. > Right, they're executed once currently. But the prologue could be sunk > into locations where they are not executed every time the function is > called. That's the basis behind shrink wrapping.
Right. > Segher's code essentially allows individual components of the prologue > to sink to different points within the function rather than forcing the > prologue to be sunk as an atomic unit. It also allows prologue an epilogue components to be placed in multiple places, and even allows them to be executed more than once, if that is cheaper. > >>The full-prologue algorithm makes as many blocks run without prologue as > >>possible, by duplicating blocks where that helps. If you do this for > >>every component you can and up with 2**40 blocks for just 40 components, > > > >Ok, so why wouldn't we use the existing code with the duplication part > >disabled? That's a later addition anyway and isn't necessary to do > >shrink-wrapping in the first place. > I think the concern here is the balance between code explosion and the > runtime gains. The code explosion can be terrible if you have only a few components, already. The runtime gains are not so great either. Here's a simple example, showing part of a cfg, the exits to the right are calls to abort and need some prologue component: | 1 |\ | \ 2 X1 |\ | \ | X2 | With the "old" algorithm, you need to place the prologue at 1 (because you can only have one), and then the fall-through path also necessarily gets that component's prologue. With the "separate" algorithm, the component prologues can be placed at X1 and X2 instead (and that will most likely be cheapest according to the cost model, too). You can also put this in a loop. I'll try to make a simple example with that. Segher