On Wed, 2019-06-12 at 14:03:17 UTC, Nicholas Piggin wrote: > current may be cached by the compiler, so remove the volatile asm > restriction. This results in better generated code, as well as being > smaller and fewer dependent loads, it can avoid store-hit-load flushes > like this one that shows up in irq_exit(): > > preempt_count_sub(HARDIRQ_OFFSET); > if (!in_interrupt() && ...) > > Which ends up as: > > ((struct thread_info *)current)->preempt_count -= HARDIRQ_OFFSET; > if (((struct thread_info *)current)->preempt_count ... > > Evaluating current twice presently means it has to be loaded twice, and > here gcc happens to pick a different register each time, then > preempt_count is accessed via that base register: > > 1058: ld r10,2392(r13) <-- current > 105c: lwz r9,0(r10) <-- preempt_count > 1060: addis r9,r9,-1 > 1064: stw r9,0(r10) <-- preempt_count > 1068: ld r9,2392(r13) <-- current > 106c: lwz r9,0(r9) <-- preempt_count > 1070: rlwinm. r9,r9,0,11,23 > 1074: bne 1090 <irq_exit+0x60> > > This can frustrate store-hit-load detection heuristics and cause > flushes. Allowing the compiler to cache current in a reigster with this > patch results in the same base register being used for all accesses, > which is more likely to be detected as an alias: > > 1058: ld r31,2392(r13) > ... > 1070: lwz r9,0(r31) > 1074: addis r9,r9,-1 > 1078: stw r9,0(r31) > 107c: lwz r9,0(r31) > 1080: rlwinm. r9,r9,0,11,23 > 1084: bne 10a0 <irq_exit+0x60> > > Signed-off-by: Nicholas Piggin <npig...@gmail.com>
Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/e354d7dc81d0e81bea33165f381aff1eda45f5d9 cheers