On June 26, 2026 1:17:13 PM GMT+01:00, Bradley Morgan <[email protected]> wrote: >On June 26, 2026 1:14:14 PM GMT+01:00, Petr Mladek <[email protected]> >wrote: >>On Fri 2026-06-26 12:23:50, Petr Mladek wrote: >>> On Thu 2026-06-25 15:25:58, Bradley Morgan wrote: >>> > panic_other_cpus_shutdown() handles SYS_INFO_ALL_BT before stopping >>the >>> > other CPUs. Do not ask sys_info() to handle that bit again later in >>the >>> > panic path. >>> > >>> > Use sys_info_with_filter() so panic_print=all_bt does not request >more >>> > output after the CPUs are stopped. >>> > >>> > Fixes: a9af76a78760 ("watchdog: add sys_info sysctls to dump sys info >>on system lockup") >>> > Cc: [email protected] >>> > Signed-off-by: Bradley Morgan <[email protected]> >>> > --- >>> > kernel/panic.c | 2 +- >>> > 1 file changed, 1 insertion(+), 1 deletion(-) >>> > >>> > diff --git a/kernel/panic.c b/kernel/panic.c >>> > index 213725b612aa..eb842823df61 100644 >>> > --- a/kernel/panic.c >>> > +++ b/kernel/panic.c >>> > @@ -680,7 +680,7 @@ void vpanic(const char *fmt, va_list args) >>> > */ >>> > atomic_notifier_call_chain(&panic_notifier_list, 0, buf); >>> > >>> > - sys_info(panic_print); >>> > + sys_info_with_filter(panic_print, SYS_INFO_ALL_BT); >>> >>> Hmm, this prevents printing backtraces from all CPUs completely. >>> But what if they were not printed? >>> >>> They might be printed by: >>> >>> static void panic_other_cpus_shutdown(bool crash_kexec) >>> { >>> if (panic_print & SYS_INFO_ALL_BT) >>> panic_trigger_all_cpu_backtrace(); >>> >>> [...] >>> } >>> >>> But it checks only "panic_print" variable. It won't do anything >>> when (panic_print == 0). >>> >>> In this case, we might still want to print the backraces when >>> SYS_INFO_ALL_BT is set in kernel_si_info. >>> >>> > kmsg_dump_desc(KMSG_DUMP_PANIC, buf); >>> >>> Of course, we might fix panic_other_cpus_shutdown() to check also >>> kernel_si_info. >>> >>> But it all becomes very hairy. We have several levels: >>> >>> + watchdog-all_bt-specific option, e.g. >>sysctl_hardlockup_all_cpu_backtrace >>> >>> + watchdog-specific si_info preferences, e.g. hardlockup_si_mask >>> >>> + panic-specific si_info: panic_print >>> >>> + universal fallback for any layer: kernel_si_info >>> >>> Now, we try to check all these variables back and forth to >>> trigger all backtraces or to avoid triggering them. >>> And it clearly does not work well and the code is more and more >>> hairy. >>> >>> I think about another approach. The word "waterfall" comes to my mind. >>> Instead of checking all the settings back and forth, let's process >>> each setting one by one and just remember what has been done and >>> skip this in the next level. >>> >>> All the si_info actions seems to dump a global system state. >>> So, it would make sense to remember the state in a global variable >>> even when it might be modified by more CPUs in parallel. >>> >>> I am going to think more about it. >> >>I have created a POC using Gemini. I haven't tested it. >>But it looks acceptable. And the logic seems to be more >>straightforward. >> >>One drawback is that it requires adding the _reset() >>call for all sys_info() callers. It is fine in principle >>but it might complicate back-porting because all changes >>have to be done in one patch. >> >>But honestly, this is a nice to have fix. Most people could >>live happily without it. >> >>From 3c66436d9978030845a96bfaedd6b914536e2ac4 Mon Sep 17 00:00:00 2001 >>From: Petr Mladek <[email protected]> >>Date: Fri, 26 Jun 2026 13:55:41 +0200 >>Subject: [POC] sys_info: Introduce state-tracking APIs to prevent >duplicate >> backtraces >> >>In watchdog, panic, and hung task detection scenarios, sys_info() can >>be called multiple times or alongside direct backtrace triggers like >>trigger_allbutcpu_cpu_backtrace(). This results in identical backtraces >>being dumped repeatedly from all CPUs, cluttering the kernel log and >>delaying or obscuring critical debug details. >> >>Introduce a state tracking bitmask and associated helpers: >>- sys_info_done(mask): Marks specific sys_info bits as already printed. >>- sys_info_reset(): Resets the tracking state. >>- sys_info_is_done(mask): Checks if all bits in the mask have been >printed. >> >>Update sys_info() to automatically filter out already printed bits >>using this state. Integrate these APIs with the generic hardlockup >>and softlockup watchdogs, the PowerPC watchdog, the hung task detector, >>and the panic core. This ensures that each piece of system information >>and backtrace output is printed at most once per lockup/panic event, >>and the state is reset cleanly when a lockup does not trigger a panic. >> >>Races between sys_info() callers are ignored. It should be acceptable >>because the output from various watchdogs has never been synchronized. >>And panic() never returns. >> >>Assisted-by: gemini-1.5-flash ? > >Why not use gemini 3.5 flash? > >I can try if you want. > >Could I have the prompt you used? :) > >>Signed-off-by: Petr Mladek <[email protected]> >>--- >> arch/powerpc/kernel/watchdog.c | 13 ++++++++++--- >> include/linux/sys_info.h | 3 +++ >> kernel/hung_task.c | 2 ++ >> kernel/panic.c | 4 +++- >> kernel/watchdog.c | 10 ++++++++-- >> lib/sys_info.c | 30 +++++++++++++++++++++++++++++- >> 6 files changed, 55 insertions(+), 7 deletions(-) >> >>diff --git a/arch/powerpc/kernel/watchdog.c >b/arch/powerpc/kernel/watchdog.c >>index c40c69368476..0eab7894b9dc 100644 >>--- a/arch/powerpc/kernel/watchdog.c >>+++ b/arch/powerpc/kernel/watchdog.c >>@@ -239,6 +239,7 @@ static void watchdog_smp_panic(int cpu) >> if (sysctl_hardlockup_all_cpu_backtrace || >> (hardlockup_si_mask & SYS_INFO_ALL_BT)) { >> trigger_allbutcpu_cpu_backtrace(cpu); >>+ sys_info_done(SYS_INFO_ALL_BT); >> cpumask_clear(&wd_smp_cpus_ipi); >> } else { >> /* >>@@ -251,10 +252,12 @@ static void watchdog_smp_panic(int cpu) >> } >> } >> >>- sys_info(hardlockup_si_mask & ~SYS_INFO_ALL_BT); >>+ sys_info(hardlockup_si_mask); >> if (hardlockup_panic) >> nmi_panic(NULL, "Hard LOCKUP"); >> >>+ sys_info_reset(); >>+ >> wd_end_reporting(); >> >> return; >>@@ -419,13 +422,17 @@ DEFINE_INTERRUPT_HANDLER_NMI(soft_nmi_interrupt) >> xchg(&__wd_nmi_output, 1); // see wd_lockup_ipi >> >> if (sysctl_hardlockup_all_cpu_backtrace || >>- (hardlockup_si_mask & SYS_INFO_ALL_BT)) >>+ (hardlockup_si_mask & SYS_INFO_ALL_BT)) { >> trigger_allbutcpu_cpu_backtrace(cpu); >>+ sys_info_done(SYS_INFO_ALL_BT); >>+ } >> >>- sys_info(hardlockup_si_mask & ~SYS_INFO_ALL_BT); >>+ sys_info(hardlockup_si_mask); >> if (hardlockup_panic) >> nmi_panic(regs, "Hard LOCKUP"); >> >>+ sys_info_reset(); >>+ >> wd_end_reporting(); >> } >> /* >>diff --git a/include/linux/sys_info.h b/include/linux/sys_info.h >>index a5bc3ea3d44b..ad43548c75dd 100644 >>--- a/include/linux/sys_info.h >>+++ b/include/linux/sys_info.h >>@@ -18,6 +18,9 @@ >> #define SYS_INFO_BLOCKED_TASKS 0x00000080 >> >> void sys_info(unsigned long si_mask); >>+void sys_info_done(unsigned long si_mask); >>+void sys_info_reset(void); >>+bool sys_info_is_done(unsigned long si_mask); >> unsigned long sys_info_parse_param(char *str); >> >> #ifdef CONFIG_SYSCTL >>diff --git a/kernel/hung_task.c b/kernel/hung_task.c >>index 6fcc94ce4ca9..dbb6a27770f5 100644 >>--- a/kernel/hung_task.c >>+++ b/kernel/hung_task.c >>@@ -354,6 +354,8 @@ static void check_hung_uninterruptible_tasks(unsigned >long timeout) >> >> if (hung_task_call_panic) >> panic("hung_task: blocked tasks"); >>+ >>+ sys_info_reset(); >> } >> >> static long hung_timeout_jiffies(unsigned long last_checked, >>diff --git a/kernel/panic.c b/kernel/panic.c >>index 213725b612aa..86ce17f03da2 100644 >>--- a/kernel/panic.c >>+++ b/kernel/panic.c >>@@ -550,8 +550,10 @@ static void panic_trigger_all_cpu_backtrace(void) >> */ >> static void panic_other_cpus_shutdown(bool crash_kexec) >> { >>- if (panic_print & SYS_INFO_ALL_BT) >>+ if ((panic_print & SYS_INFO_ALL_BT) && >>!sys_info_is_done(SYS_INFO_ALL_BT)) { >> panic_trigger_all_cpu_backtrace(); >>+ sys_info_done(SYS_INFO_ALL_BT); >>+ } >> >> /* >> * Note that smp_send_stop() is the usual SMP shutdown function, >>diff --git a/kernel/watchdog.c b/kernel/watchdog.c >>index 87dd5e0f6968..f431087c68a7 100644 >>--- a/kernel/watchdog.c >>+++ b/kernel/watchdog.c >>@@ -282,14 +282,17 @@ void watchdog_hardlockup_check(unsigned int cpu, >struct pt_regs *regs) >> >> if (hardlockup_all_cpu_backtrace) { >> trigger_allbutcpu_cpu_backtrace(cpu); >>+ sys_info_done(SYS_INFO_ALL_BT); >> if (!hardlockup_panic) >> clear_bit_unlock(0, &hard_lockup_nmi_warn); >> } >> >>- sys_info(hardlockup_si_mask & ~SYS_INFO_ALL_BT); >>+ sys_info(hardlockup_si_mask); >> if (hardlockup_panic) >> nmi_panic(regs, "Hard LOCKUP"); >> >>+ sys_info_reset(); >>+ >> per_cpu(watchdog_hardlockup_warned, cpu) = true; >> } >> >>@@ -895,16 +898,19 @@ static enum hrtimer_restart >watchdog_timer_fn(struct hrtimer *hrtimer) >> >> if (softlockup_all_cpu_backtrace) { >> trigger_allbutcpu_cpu_backtrace(smp_processor_id()); >>+ sys_info_done(SYS_INFO_ALL_BT); >> if (!softlockup_panic) >> clear_bit_unlock(0, &soft_lockup_nmi_warn); >> } >> >> add_taint(TAINT_SOFTLOCKUP, LOCKDEP_STILL_OK); >>- sys_info(softlockup_si_mask & ~SYS_INFO_ALL_BT); >>+ sys_info(softlockup_si_mask); >> thresh_count = duration / get_softlockup_thresh(); >> >> if (softlockup_panic && thresh_count >= softlockup_panic) >> panic("softlockup: hung tasks"); >>+ >>+ sys_info_reset(); >> } >> >> return HRTIMER_RESTART; >>diff --git a/lib/sys_info.c b/lib/sys_info.c >>index f32a06ec9ed4..f8e6176fae75 100644 >>--- a/lib/sys_info.c >>+++ b/lib/sys_info.c >>@@ -160,7 +160,35 @@ static void __sys_info(unsigned long si_mask) >> show_state_filter(TASK_UNINTERRUPTIBLE); >> } >> >>+static unsigned long sys_info_done_mask; >>+ >>+void sys_info_done(unsigned long si_mask) >>+{ >>+ sys_info_done_mask |= si_mask; >>+} >>+ >>+void sys_info_reset(void) >>+{ >>+ sys_info_done_mask = 0; >>+} >>+ >>+bool sys_info_is_done(unsigned long si_mask) >>+{ >>+ return (sys_info_done_mask & si_mask) == si_mask; >>+} >>+ >> void sys_info(unsigned long si_mask) >> { >>- __sys_info(si_mask ? : kernel_si_mask); >>+ unsigned long mask; >>+ >>+ if (si_mask) >>+ mask = si_mask & ~sys_info_done_mask; >>+ else >>+ mask = kernel_si_mask & ~sys_info_done_mask; >>+ >>+ if (!mask) >>+ return; >>+ >>+ __sys_info(mask); >>+ sys_info_done(mask); >> } >> > >Thanks!
Hmm.. new idea kernel/dump_filter.c ? What this file could do is to handle a generic lockup state machine so any subsystem can log what it already dumped? I know it may bloat, but it's better then cramming fixes in. What do you guys think? Maybe we could start a RFC for this? Thanks!
