On 08/26/2016 10:27 AM, Segher Boessenkool wrote:
On Fri, Aug 26, 2016 at 05:03:34PM +0200, Bernd Schmidt wrote:
On 08/26/2016 04:50 PM, Segher Boessenkool wrote:
The head comment starts with
+/* Separate shrink-wrapping
+
+ Instead of putting all of the prologue and epilogue in one spot, we
+ can put parts of it in places where those components are executed less
+ frequently.
and that is the long and short of it.
And that comment puzzles me. Surely prologue and epilogue are executed
only once currently, so how does frequency come into it? Again - please
provide an example.
If some component is only needed for 0.01% of executions of a function,
running it once for every execution is 10000 times too much.
The trivial example is a function that does an early exit, but uses one
or a few non-volatile registers before that exit. This happens in e.g.
glibc's malloc, if you want an easily accessed example. With the current
code, *all* components will be saved and then restored shortly afterwards.
So can you expand on the malloc example a bit -- I'm pretty sure I
understand what you're trying to do, but a concrete example may help
Bernd and be useful for archival purposes.
I also know that Carlos is interested in the malloc example -- so I'd
like to be able to pass that along to him.
Given the multiple early exit and fast paths through the allocator, I'm
not at all surprised that sinking different components of the prologue
to different locations is useful.
Also if there's a case where sinking into a loop occurs, definitely
point that out.
The full-prologue algorithm makes as many blocks run without prologue as
possible, by duplicating blocks where that helps. If you do this for
every component you can and up with 2**40 blocks for just 40 components,
Ok, so why wouldn't we use the existing code with the duplication part
disabled?
That would not perform nearly as well.
That's a later addition anyway and isn't necessary to do
shrink-wrapping in the first place.
No, it always did that, just not as often (it only duplicated straight-line
code before).
Presumably (I haven't looked yet), the duplication is so that we can
isolate one or more paths which in turn allows sinking the prologue
further on some of those paths.
This is something I'll definitely want to look at -- block duplication
to facilitate code elimination (or in this case avoid code insertion)
hits several areas of interest to me -- and how we balance duplication
vs runtime savings is always interesting.
Jeff