On 08/26/2016 09:03 AM, Bernd Schmidt wrote:
On 08/26/2016 04:50 PM, Segher Boessenkool wrote:
The head comment starts with
+/* Separate shrink-wrapping
+
+ Instead of putting all of the prologue and epilogue in one spot, we
+ can put parts of it in places where those components are executed
less
+ frequently.
and that is the long and short of it.
And that comment puzzles me. Surely prologue and epilogue are executed
only once currently, so how does frequency come into it? Again - please
provide an example.
Right, they're executed once currently. But the prologue could be sunk
into locations where they are not executed every time the function is
called. That's the basis behind shrink wrapping.
Segher's code essentially allows individual components of the prologue
to sink to different points within the function rather than forcing the
prologue to be sunk as an atomic unit.
Conceptually you could run the standard algorithm on each independent
component.
The full-prologue algorithm makes as many blocks run without prologue as
possible, by duplicating blocks where that helps. If you do this for
every component you can and up with 2**40 blocks for just 40 components,
Ok, so why wouldn't we use the existing code with the duplication part
disabled? That's a later addition anyway and isn't necessary to do
shrink-wrapping in the first place.
I think the concern here is the balance between code explosion and the
runtime gains.
jeff