Re: [PATCH][AArch64] Separate shrink wrapping hooks implementation

Segher Boessenkool Thu, 17 Nov 2016 06:45:14 -0800

Hi Kyrill,

On Thu, Nov 17, 2016 at 02:22:08PM +0000, Kyrill Tkachov wrote:
> >>>>>>I ran SPEC2006 on a Cortex-A72. Overall scores were neutral but there 
> >>>>>>were
> >>>>>>some interesting swings.
> >>>>>>458.sjeng     +1.45%
> >>>>>>471.omnetpp   +2.19%
> >>>>>>445.gobmk     -2.01%
> >>>>>>
> >>>>>>On SPECFP:
> >>>>>>453.povray    +7.00%


> After looking at the gobmk performance with performance counters it looks 
> like more icache pressure.
> I see an increase in misses.
> This looks to me like an effect of code size increase, though it is not 
> that large an increase (0.4% with SWS).

Right.  I don't see how to improve on this (but see below); ideas welcome :-)

> Branch mispredicts also go up a bit but not as much as icache misses.

I don't see that happening -- for some testcases we get unlucky and have
more branch predictor aliasing, and for some we have less, it's pretty
random.  Some testcases are really sensitive to this.

> I don't think there's anything we can do here, or at least that this patch 
> can do about it.
> Overall, there's a slight improvement in SPECINT, even with the gobmk 
> regression and a slightly larger improvement
> on SPECFP due to povray.

And that is for only the "normal" GPRs, not LR or FP yet, right?

> Segher, one curious artifact I spotted while looking at codegen differences 
> in gobmk was a case where we fail
> to emit load-pairs as effectively in the epilogue and its preceeding basic 
> block.
> So before we had this epilogue:
> .L43:
>     ldp    x21, x22, [sp, 16]
>     ldp    x23, x24, [sp, 32]
>     ldp    x25, x26, [sp, 48]
>     ldp    x27, x28, [sp, 64]
>     ldr    x30, [sp, 80]
>     ldp    x19, x20, [sp], 112
>     ret
> 
> and I see this becoming (among numerous other changes in the function):
> 
> .L69:
>     ldp    x21, x22, [sp, 16]
>     ldr    x24, [sp, 40]
> .L43:
>     ldp    x25, x26, [sp, 48]
>     ldp    x27, x28, [sp, 64]
>     ldr    x23, [sp, 32]
>     ldr    x30, [sp, 80]
>     ldp    x19, x20, [sp], 112
>     ret
> 
> So this is better in the cases where we jump straight into .L43 because we 
> load fewer registers
> but worse when we jump to or fallthrough to .L69 because x23 and x24 are 
> now restored using two loads
> rather than a single load-pair. This hunk isn't critical to performance in 
> gobmk though.

Is loading/storing a pair as cheap as loading/storing a single register?
In that case you could shrink-wrap per pair of registers instead.


Segher

Re: [PATCH][AArch64] Separate shrink wrapping hooks implementation

Reply via email to