Excerpts from Daniel Axtens's message of November 12, 2021 4:08 pm: > Hi, > >> The printk layer at the moment does not seem to have a good way to force >> flush printk messages that are created in NMI context, except in the >> panic path. >> >> NMI-context printk messages normally get to the console with irq_work, >> but that won't help if the CPU is stuck with irqs disabled, as can be >> the case for hard lockup watchdog messages. >> >> The watchdog currently flushes the printk buffers after detecting a >> lockup on remote CPUs, but they may not have processed their NMI IPI >> yet by that stage, or they may have self-detected a lockup in which >> case they won't go via this NMI IPI path. >> >> Improve the situation by having NMI-context mark a flag if it called >> printk, and have watchdog timer interrupts check if that flag was set >> and try to flush if it was. Latency is not a big problem because we >> were already stuck for a while, just need to try to make sure the >> messages eventually make it out.
Sorry just coming back to this now the printk fix was merged upstream. > Initially I was surprised that this doesn't affect the printk code > itself, just the powerpc code... Yeah I'm actually not sure how other watchdogs handle this. If they use nmi_trigger_cpumask_backtrace() then that does have a flush. >> Cc: Laurent Dufour <lduf...@linux.ibm.com> >> Signed-off-by: Nicholas Piggin <npig...@gmail.com> >> --- >> This patch is actually based on top of this one which is planned to go >> upstream in rc1/2. https://marc.info/?l=linux-kernel&m=163626070312052&w=2 >> >> Prior to commit 93d102f094be that is fixed by the above, we had a printk >> flush function with a different name but basically does the same job, so >> this patch can be backported, just needs some care. I'm posting it for >> review now for feedback so it's ready to go when the printk patch is >> upstream. >> >> Thanks, >> Nick >> >> arch/powerpc/kernel/watchdog.c | 29 +++++++++++++++++++++++------ >> 1 file changed, 23 insertions(+), 6 deletions(-) >> >> diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c >> index b6533539386b..a7b6b0691203 100644 >> --- a/arch/powerpc/kernel/watchdog.c >> +++ b/arch/powerpc/kernel/watchdog.c >> @@ -86,6 +86,7 @@ static DEFINE_PER_CPU(u64, wd_timer_tb); >> /* SMP checker bits */ >> static unsigned long __wd_smp_lock; >> static unsigned long __wd_reporting; >> +static unsigned long __wd_nmi_output; >> static cpumask_t wd_smp_cpus_pending; >> static cpumask_t wd_smp_cpus_stuck; >> static u64 wd_smp_last_reset_tb; >> @@ -154,6 +155,18 @@ static void wd_lockup_ipi(struct pt_regs *regs) >> else >> dump_stack(); >> >> + /* >> + * We printk()ed from NMI context, the contents may not get flushed >> + * if we return to a context with interrupts disabled because >> + * printk uses irq_work to schedule flushes of NMI output. >> + * __wd_nmi_output says there has been output from NMI context, so >> + * other CPUs are recruited to help flush it. >> + * >> + * xchg is not needed here (it could be a simple atomic store), but >> + * it gives the memory ordering and atomicity required. >> + */ >> + xchg(&__wd_nmi_output, 1); >> + >> /* Do not panic from here because that can recurse into NMI IPI layer */ >> } > > I think, looking at this and the other site where __wd_nmi_output is > set, that this works because you set the flag only when you are done > printing from the non-panic lockup context on this CPU. I was initially > worried that you set this flag part way through printing, and then it > might get cleared by another CPU while you're still trying to print. > However, in this function it's right at the end - there's nothing else > left to do, and ... > >> DEFINE_INTERRUPT_HANDLER_NMI(soft_nmi_interrupt) >> @@ -386,6 +401,8 @@ DEFINE_INTERRUPT_HANDLER_NMI(soft_nmi_interrupt) >> print_irqtrace_events(current); >> show_regs(regs); >> >> + xchg(&__wd_nmi_output, 1); // see wd_lockup_ipi >> + >> if (sysctl_hardlockup_all_cpu_backtrace) >> trigger_allbutself_cpu_backtrace(); > > in this one, the only things that can happen afterwards are > - a panic, which does its own flushing, and > > - trigger_allbutself_cpu_backtrace(), which seems to just send IPIs, not > do any printing of its own. Yeah, on powerpc that actually ends up not using NMI IPIs because that's dangerous and core code uses it for less severe things than a hard lockup so we disabled it, so if you can take a regular IPI then you can run irq work when it returns and print things. If we changed it to use NMI IPIs I guess we might want to do something, maybe. > All of which is fine, but I wonder if we need a more descriptive > variable name or if the comment needs to specify that the flag should > only be set at certain times. I did try to add some explanation in that comment. I'll see if that can be made a bit more readable and expand on it a bit. Thanks, Nick