On Fri, 7 Sep 2018 09:34:48 +0200
Peter Zijlstra <pet...@infradead.org> wrote:

> On Wed, Sep 05, 2018 at 09:33:34PM -0400, Steven Rostedt wrote:
> >   do_idle {
> > 
> >     [interrupts enabled]
> > 
> >     <interrupt> [interrupts disabled]
> >     TRACE_IRQS_OFF [lockdep says irqs off]
> >     [...]
> >     TRACE_IRQS_IRET
> >         test if pt_regs say return to interrupts enabled [yes]
> >         TRACE_IRQS_ON [lockdep says irqs are on]
> > 
> >         <nmi>
> >             nmi_enter() {
> >                 printk_nmi_enter() [traced by ftrace]
> >                 [ hit ftrace breakpoint ]
> >                 <breakpoint exception>
> >                     TRACE_IRQS_OFF [lockdep says irqs off]
> >                     [...]
> >                     TRACE_IRQS_IRET [return from breakpoint]
> >                        test if pt_regs say interrupts enabled [no]
> >                        [iret back to interrupt]
> >        [iret back to code]
> > 
> >     tick_nohz_idle_enter() {
> > 
> >     lockdep_assert_irqs_enabled() [lockdep say no!]  
> 
> Isn't the problem that we muck with the IRQ state from NMI context? We
> shouldn't be doing that.

Not really. The problem is that the exception handling code jumps into
lockdep, and we allow exceptions to happen in NMIs. It's just that the
tracing of printk_nmi_enter() caused an exception before lockdep was
turned off.

> 
> The thing is, since we trace the IRQ state from within IRQ-disable,
> since that's the only IRQ-safe option, it is very much not NMI-safe.
> 
> Your patch might avoid the symptom, but I don't think it cures the
> fundamental problem.

No, the issue is that the nmi_enter() code was traced early which adds
breakpoints and breakpoints causes exceptions which jumps into lockdep,
and we did this before lockdep was turned off.

-- Steve

Reply via email to