Benjamin Herrenschmidt <b...@kernel.crashing.org> writes: > On Mon, 2017-12-04 at 16:09 +1100, Michael Ellerman wrote: >> Nicholas Piggin <npig...@gmail.com> writes: >> >> > When an interrupt is returning to a soft-disabled context (which can >> > happen for non-maskable interrupts or synchronous interrupts), it goes >> > through the motions of soft-disabling again, including calling >> > TRACE_DISABLE_INTS (i.e., trace_hardirqs_off()). >> > >> > This is not necessary, because we must already be soft-disabled in the >> > interrupt context, it also may be causing crashes in the irq tracing >> > code to re-enter as an nmi. Replace it with a warning to ensure that >> > soft-interrupts are still disabled. >> > >> > Signed-off-by: Nicholas Piggin <npig...@gmail.com> >> > --- >> > arch/powerpc/kernel/entry_64.S | 10 +++++++--- >> > 1 file changed, 7 insertions(+), 3 deletions(-) >> >> So this patch is the core of the bug fix I gather. >> >> Git blames says: >> >> Fixes: 7c0482e3d055 ("powerpc/irq: Fix another case of lazy IRQ state >> getting out of sync") >> Cc: sta...@vger.kernel.org # v3.4+ >> >> But I'm wondering how this has been broken that long without us >> noticing? You hit it doing some sort of perf stress test I think - so is >> it just that we've never pushed hard enough? Or did something change to >> expose this? Or we're just not sure? > > We have some traps that do local_irq_enable ... you may want to double > check instruction emu, page faults, alignment etc... I wouldn't be > surprised if we have case where an interrupt "returns" soft enabled.
AFAICT those all check that they're coming from a soft-enabled context: if (!arch_irq_disabled_regs(regs)) local_irq_enable(); And the code Nick is patching is the case where we're returning to a soft-disabled context. So I think the patch is good. cheers