On 09/30/2016 04:34 AM, Segher Boessenkool wrote:
[ whoops, message too big, resending with the attachment compressed ]
On Tue, Sep 27, 2016 at 03:14:51PM -0600, Jeff Law wrote:
With transposition issue addressed, the only blocker I see are some
simple testcases we can add to the suite. They don't have to be real
extensive. And one motivating example for the list archives, ideally
the glibc malloc case.
And here is the malloc testcase.
A very important (for performance) function is _int_malloc, which starts
with
[ ... ]
THanks. What I think is important to note with this example is the bits
that were pushed into the path with the sysmalloc/alloc_perturb calls.
That's an unlikely path.
We have to extrapolate a bit from the assembly provided. In the not
separately shrink-wrapped version, we have a full prologue of stores and
two instances of a full epilogue (though only one ever executes) provided.
With separate shrink wrapping the (presumably) very cold path where we
error has virtually no prologue/epilogue. That's probably a nop from a
performance standpoint.
More interesting is the path where we call sysmalloc/alloc_perturb, it's
a cold path, but not as cold as the error path. We save/restore 4 regs
in that case. Rather than a full prologue/epilogue. So there's clearly
a savings there, though again, via the expect it's a cold path.
Where we have to extrapolate is the hot path. Presumably on the hot
path we're saving/restoring ~4 fewer registers. I haven't verified
that, but that is kindof the whole point here.
Thanks,
Jeff