On Mon, 04 Dec 2017 16:09:57 +1100 Michael Ellerman <m...@ellerman.id.au> wrote:
> Nicholas Piggin <npig...@gmail.com> writes: > > > When an interrupt is returning to a soft-disabled context (which can > > happen for non-maskable interrupts or synchronous interrupts), it goes > > through the motions of soft-disabling again, including calling > > TRACE_DISABLE_INTS (i.e., trace_hardirqs_off()). > > > > This is not necessary, because we must already be soft-disabled in the > > interrupt context, it also may be causing crashes in the irq tracing > > code to re-enter as an nmi. Replace it with a warning to ensure that > > soft-interrupts are still disabled. > > > > Signed-off-by: Nicholas Piggin <npig...@gmail.com> > > --- > > arch/powerpc/kernel/entry_64.S | 10 +++++++--- > > 1 file changed, 7 insertions(+), 3 deletions(-) > > So this patch is the core of the bug fix I gather. > > Git blames says: > > Fixes: 7c0482e3d055 ("powerpc/irq: Fix another case of lazy IRQ state > getting out of sync") > Cc: sta...@vger.kernel.org # v3.4+ > > But I'm wondering how this has been broken that long without us > noticing? You hit it doing some sort of perf stress test I think - so is > it just that we've never pushed hard enough? Or did something change to > expose this? Or we're just not sure? I'm not really sure. A customer hit it, during either a stress test or long running workload with lockdep irq tracing and perf running at the same time. I don't have a lot more details but we might be able to get some offline if necessary. Thanks, Nick