On Wed, Feb 03, 2016 at 10:23:42AM -0700, Jeffrey Merkey wrote:
> > Hmm, I am confused here.  So you are saying because we are in the nmi
> > handler you can not break into the system?  The nmi handler prints some
> > stuff to the screen, pokes the other cpus to print stuff to the screen and
> > then returns to a normal operation.  Unless you are saying the act of
> > sending NMI IPIs never completes (because a cpu is blocking IPI
> > interrupts),
> > so the cpu hangs in nmi context and the debugger never has a chance to
> > 'break' in and see what is going on?
> >
> > Cheers,
> > Don
> >
> 
> Yes.  the nmi handlers never complete for the bug I worked on with
> tglx, probably because an nmi handler is calling timekeeper.c
> somewhere.  Some of these lockup bugs may be calling code from the nmi
> handlers that cause the lockup condition in the first place in some
> cases, so it will never reach a call to panic.  Looking over this code
> it's damn hard to find a good way to do this that works across all the
> arches without adding another macro to bug.h (BREAK_ON maybe), so I
> just used one that's already there.  I'll go back and rethink this
> some more.  It could just be as simple as calling panic from the first
> detection -- that works.

So, if you disable 'sysctl_hardlockup_all_cpu_backtrace' and enable
'hardlockup_panic', you should be able to achieve what you want, no?

But you mentioned you wanted to recover?  Hence avoiding the panic?

Cheers,
Don

Reply via email to