Hi Kyrill, On Thu, Nov 17, 2016 at 02:22:08PM +0000, Kyrill Tkachov wrote: > >>>>>>I ran SPEC2006 on a Cortex-A72. Overall scores were neutral but there > >>>>>>were > >>>>>>some interesting swings. > >>>>>>458.sjeng +1.45% > >>>>>>471.omnetpp +2.19% > >>>>>>445.gobmk -2.01% > >>>>>> > >>>>>>On SPECFP: > >>>>>>453.povray +7.00%
> After looking at the gobmk performance with performance counters it looks > like more icache pressure. > I see an increase in misses. > This looks to me like an effect of code size increase, though it is not > that large an increase (0.4% with SWS). Right. I don't see how to improve on this (but see below); ideas welcome :-) > Branch mispredicts also go up a bit but not as much as icache misses. I don't see that happening -- for some testcases we get unlucky and have more branch predictor aliasing, and for some we have less, it's pretty random. Some testcases are really sensitive to this. > I don't think there's anything we can do here, or at least that this patch > can do about it. > Overall, there's a slight improvement in SPECINT, even with the gobmk > regression and a slightly larger improvement > on SPECFP due to povray. And that is for only the "normal" GPRs, not LR or FP yet, right? > Segher, one curious artifact I spotted while looking at codegen differences > in gobmk was a case where we fail > to emit load-pairs as effectively in the epilogue and its preceeding basic > block. > So before we had this epilogue: > .L43: > ldp x21, x22, [sp, 16] > ldp x23, x24, [sp, 32] > ldp x25, x26, [sp, 48] > ldp x27, x28, [sp, 64] > ldr x30, [sp, 80] > ldp x19, x20, [sp], 112 > ret > > and I see this becoming (among numerous other changes in the function): > > .L69: > ldp x21, x22, [sp, 16] > ldr x24, [sp, 40] > .L43: > ldp x25, x26, [sp, 48] > ldp x27, x28, [sp, 64] > ldr x23, [sp, 32] > ldr x30, [sp, 80] > ldp x19, x20, [sp], 112 > ret > > So this is better in the cases where we jump straight into .L43 because we > load fewer registers > but worse when we jump to or fallthrough to .L69 because x23 and x24 are > now restored using two loads > rather than a single load-pair. This hunk isn't critical to performance in > gobmk though. Is loading/storing a pair as cheap as loading/storing a single register? In that case you could shrink-wrap per pair of registers instead. Segher