On Wed, Sep 05, 2018 at 09:33:34PM -0400, Steven Rostedt wrote: > do_idle { > > [interrupts enabled] > > <interrupt> [interrupts disabled] > TRACE_IRQS_OFF [lockdep says irqs off] > [...] > TRACE_IRQS_IRET > test if pt_regs say return to interrupts enabled [yes] > TRACE_IRQS_ON [lockdep says irqs are on] > > <nmi> > nmi_enter() { > printk_nmi_enter() [traced by ftrace] > [ hit ftrace breakpoint ] > <breakpoint exception> > TRACE_IRQS_OFF [lockdep says irqs off] > [...] > TRACE_IRQS_IRET [return from breakpoint] > test if pt_regs say interrupts enabled [no] > [iret back to interrupt] > [iret back to code] > > tick_nohz_idle_enter() { > > lockdep_assert_irqs_enabled() [lockdep say no!]
Isn't the problem that we muck with the IRQ state from NMI context? We shouldn't be doing that. The thing is, since we trace the IRQ state from within IRQ-disable, since that's the only IRQ-safe option, it is very much not NMI-safe. Your patch might avoid the symptom, but I don't think it cures the fundamental problem.