On June 26, 2026 1:14:14 PM GMT+01:00, Petr Mladek <[email protected]>
wrote:
>On Fri 2026-06-26 12:23:50, Petr Mladek wrote:
>> On Thu 2026-06-25 15:25:58, Bradley Morgan wrote:
>> > panic_other_cpus_shutdown() handles SYS_INFO_ALL_BT before stopping
>the
>> > other CPUs. Do not ask sys_info() to handle that bit again later in
>the
>> > panic path.
>> > 
>> > Use sys_info_with_filter() so panic_print=all_bt does not request more
>> > output after the CPUs are stopped.
>> > 
>> > Fixes: a9af76a78760 ("watchdog: add sys_info sysctls to dump sys info
>on system lockup")
>> > Cc: [email protected]
>> > Signed-off-by: Bradley Morgan <[email protected]>
>> > ---
>> >  kernel/panic.c | 2 +-
>> >  1 file changed, 1 insertion(+), 1 deletion(-)
>> > 
>> > diff --git a/kernel/panic.c b/kernel/panic.c
>> > index 213725b612aa..eb842823df61 100644
>> > --- a/kernel/panic.c
>> > +++ b/kernel/panic.c
>> > @@ -680,7 +680,7 @@ void vpanic(const char *fmt, va_list args)
>> >     */
>> >    atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
>> >  
>> > -  sys_info(panic_print);
>> > +  sys_info_with_filter(panic_print, SYS_INFO_ALL_BT);
>> 
>> Hmm, this prevents printing backtraces from all CPUs completely.
>> But what if they were not printed?
>> 
>> They might be printed by:
>> 
>> static void panic_other_cpus_shutdown(bool crash_kexec)
>> {
>>      if (panic_print & SYS_INFO_ALL_BT)
>>              panic_trigger_all_cpu_backtrace();
>> 
>> [...]
>> }
>> 
>> But it checks only "panic_print" variable. It won't do anything
>> when (panic_print == 0).
>> 
>> In this case, we might still want to print the backraces when
>> SYS_INFO_ALL_BT is set in kernel_si_info.
>> 
>> >    kmsg_dump_desc(KMSG_DUMP_PANIC, buf);
>> 
>> Of course, we might fix panic_other_cpus_shutdown() to check also
>> kernel_si_info.
>> 
>> But it all becomes very hairy. We have several levels:
>> 
>>    + watchdog-all_bt-specific option, e.g.
>sysctl_hardlockup_all_cpu_backtrace
>> 
>>    + watchdog-specific si_info preferences, e.g. hardlockup_si_mask
>> 
>>    + panic-specific si_info: panic_print
>> 
>>    + universal fallback for any layer: kernel_si_info
>> 
>> Now, we try to check all these variables back and forth to
>> trigger all backtraces or to avoid triggering them.
>> And it clearly does not work well and the code is more and more
>> hairy.
>> 
>> I think about another approach. The word "waterfall" comes to my mind.
>> Instead of checking all the settings back and forth, let's process
>> each setting one by one and just remember what has been done and
>> skip this in the next level.
>> 
>> All the si_info actions seems to dump a global system state.
>> So, it would make sense to remember the state in a global variable
>> even when it might be modified by more CPUs in parallel.
>> 
>> I am going to think more about it.
>
>I have created a POC using Gemini. I haven't tested it.
>But it looks acceptable. And the logic seems to be more
>straightforward.
>
>One drawback is that it requires adding the _reset()
>call for all sys_info() callers. It is fine in principle
>but it might complicate back-porting because all changes
>have to be done in one patch.
>
>But honestly, this is a nice to have fix. Most people could
>live happily without it.
>
>From 3c66436d9978030845a96bfaedd6b914536e2ac4 Mon Sep 17 00:00:00 2001
>From: Petr Mladek <[email protected]>
>Date: Fri, 26 Jun 2026 13:55:41 +0200
>Subject: [POC] sys_info: Introduce state-tracking APIs to prevent duplicate
> backtraces
>
>In watchdog, panic, and hung task detection scenarios, sys_info() can
>be called multiple times or alongside direct backtrace triggers like
>trigger_allbutcpu_cpu_backtrace(). This results in identical backtraces
>being dumped repeatedly from all CPUs, cluttering the kernel log and
>delaying or obscuring critical debug details.
>
>Introduce a state tracking bitmask and associated helpers:
>- sys_info_done(mask): Marks specific sys_info bits as already printed.
>- sys_info_reset(): Resets the tracking state.
>- sys_info_is_done(mask): Checks if all bits in the mask have been printed.
>
>Update sys_info() to automatically filter out already printed bits
>using this state. Integrate these APIs with the generic hardlockup
>and softlockup watchdogs, the PowerPC watchdog, the hung task detector,
>and the panic core. This ensures that each piece of system information
>and backtrace output is printed at most once per lockup/panic event,
>and the state is reset cleanly when a lockup does not trigger a panic.
>
>Races between sys_info() callers are ignored. It should be acceptable
>because the output from various watchdogs has never been synchronized.
>And panic() never returns.
>
>Assisted-by: gemini-1.5-flash ?

Why not use gemini 3.5 flash?

I can try if you want. 

Could I have the prompt you used? :)

>Signed-off-by: Petr Mladek <[email protected]>
>---
> arch/powerpc/kernel/watchdog.c | 13 ++++++++++---
> include/linux/sys_info.h       |  3 +++
> kernel/hung_task.c             |  2 ++
> kernel/panic.c                 |  4 +++-
> kernel/watchdog.c              | 10 ++++++++--
> lib/sys_info.c                 | 30 +++++++++++++++++++++++++++++-
> 6 files changed, 55 insertions(+), 7 deletions(-)
>
>diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
>index c40c69368476..0eab7894b9dc 100644
>--- a/arch/powerpc/kernel/watchdog.c
>+++ b/arch/powerpc/kernel/watchdog.c
>@@ -239,6 +239,7 @@ static void watchdog_smp_panic(int cpu)
>       if (sysctl_hardlockup_all_cpu_backtrace ||
>           (hardlockup_si_mask & SYS_INFO_ALL_BT)) {
>               trigger_allbutcpu_cpu_backtrace(cpu);
>+              sys_info_done(SYS_INFO_ALL_BT);
>               cpumask_clear(&wd_smp_cpus_ipi);
>       } else {
>               /*
>@@ -251,10 +252,12 @@ static void watchdog_smp_panic(int cpu)
>               }
>       }
> 
>-      sys_info(hardlockup_si_mask & ~SYS_INFO_ALL_BT);
>+      sys_info(hardlockup_si_mask);
>       if (hardlockup_panic)
>               nmi_panic(NULL, "Hard LOCKUP");
> 
>+      sys_info_reset();
>+
>       wd_end_reporting();
> 
>       return;
>@@ -419,13 +422,17 @@ DEFINE_INTERRUPT_HANDLER_NMI(soft_nmi_interrupt)
>               xchg(&__wd_nmi_output, 1); // see wd_lockup_ipi
> 
>               if (sysctl_hardlockup_all_cpu_backtrace ||
>-                  (hardlockup_si_mask & SYS_INFO_ALL_BT))
>+                  (hardlockup_si_mask & SYS_INFO_ALL_BT)) {
>                       trigger_allbutcpu_cpu_backtrace(cpu);
>+                      sys_info_done(SYS_INFO_ALL_BT);
>+              }
> 
>-              sys_info(hardlockup_si_mask & ~SYS_INFO_ALL_BT);
>+              sys_info(hardlockup_si_mask);
>               if (hardlockup_panic)
>                       nmi_panic(regs, "Hard LOCKUP");
> 
>+              sys_info_reset();
>+
>               wd_end_reporting();
>       }
>       /*
>diff --git a/include/linux/sys_info.h b/include/linux/sys_info.h
>index a5bc3ea3d44b..ad43548c75dd 100644
>--- a/include/linux/sys_info.h
>+++ b/include/linux/sys_info.h
>@@ -18,6 +18,9 @@
> #define SYS_INFO_BLOCKED_TASKS                0x00000080
> 
> void sys_info(unsigned long si_mask);
>+void sys_info_done(unsigned long si_mask);
>+void sys_info_reset(void);
>+bool sys_info_is_done(unsigned long si_mask);
> unsigned long sys_info_parse_param(char *str);
> 
> #ifdef CONFIG_SYSCTL
>diff --git a/kernel/hung_task.c b/kernel/hung_task.c
>index 6fcc94ce4ca9..dbb6a27770f5 100644
>--- a/kernel/hung_task.c
>+++ b/kernel/hung_task.c
>@@ -354,6 +354,8 @@ static void check_hung_uninterruptible_tasks(unsigned long 
>timeout)
> 
>       if (hung_task_call_panic)
>               panic("hung_task: blocked tasks");
>+
>+      sys_info_reset();
> }
> 
> static long hung_timeout_jiffies(unsigned long last_checked,
>diff --git a/kernel/panic.c b/kernel/panic.c
>index 213725b612aa..86ce17f03da2 100644
>--- a/kernel/panic.c
>+++ b/kernel/panic.c
>@@ -550,8 +550,10 @@ static void panic_trigger_all_cpu_backtrace(void)
>  */
> static void panic_other_cpus_shutdown(bool crash_kexec)
> {
>-      if (panic_print & SYS_INFO_ALL_BT)
>+      if ((panic_print & SYS_INFO_ALL_BT) && 
>!sys_info_is_done(SYS_INFO_ALL_BT)) {
>               panic_trigger_all_cpu_backtrace();
>+              sys_info_done(SYS_INFO_ALL_BT);
>+      }
> 
>       /*
>        * Note that smp_send_stop() is the usual SMP shutdown function,
>diff --git a/kernel/watchdog.c b/kernel/watchdog.c
>index 87dd5e0f6968..f431087c68a7 100644
>--- a/kernel/watchdog.c
>+++ b/kernel/watchdog.c
>@@ -282,14 +282,17 @@ void watchdog_hardlockup_check(unsigned int cpu, struct 
>pt_regs *regs)
> 
>       if (hardlockup_all_cpu_backtrace) {
>               trigger_allbutcpu_cpu_backtrace(cpu);
>+              sys_info_done(SYS_INFO_ALL_BT);
>               if (!hardlockup_panic)
>                       clear_bit_unlock(0, &hard_lockup_nmi_warn);
>       }
> 
>-      sys_info(hardlockup_si_mask & ~SYS_INFO_ALL_BT);
>+      sys_info(hardlockup_si_mask);
>       if (hardlockup_panic)
>               nmi_panic(regs, "Hard LOCKUP");
> 
>+      sys_info_reset();
>+
>       per_cpu(watchdog_hardlockup_warned, cpu) = true;
> }
> 
>@@ -895,16 +898,19 @@ static enum hrtimer_restart watchdog_timer_fn(struct 
>hrtimer *hrtimer)
> 
>               if (softlockup_all_cpu_backtrace) {
>                       trigger_allbutcpu_cpu_backtrace(smp_processor_id());
>+                      sys_info_done(SYS_INFO_ALL_BT);
>                       if (!softlockup_panic)
>                               clear_bit_unlock(0, &soft_lockup_nmi_warn);
>               }
> 
>               add_taint(TAINT_SOFTLOCKUP, LOCKDEP_STILL_OK);
>-              sys_info(softlockup_si_mask & ~SYS_INFO_ALL_BT);
>+              sys_info(softlockup_si_mask);
>               thresh_count = duration / get_softlockup_thresh();
> 
>               if (softlockup_panic && thresh_count >= softlockup_panic)
>                       panic("softlockup: hung tasks");
>+
>+              sys_info_reset();
>       }
> 
>       return HRTIMER_RESTART;
>diff --git a/lib/sys_info.c b/lib/sys_info.c
>index f32a06ec9ed4..f8e6176fae75 100644
>--- a/lib/sys_info.c
>+++ b/lib/sys_info.c
>@@ -160,7 +160,35 @@ static void __sys_info(unsigned long si_mask)
>               show_state_filter(TASK_UNINTERRUPTIBLE);
> }
> 
>+static unsigned long sys_info_done_mask;
>+
>+void sys_info_done(unsigned long si_mask)
>+{
>+      sys_info_done_mask |= si_mask;
>+}
>+
>+void sys_info_reset(void)
>+{
>+      sys_info_done_mask = 0;
>+}
>+
>+bool sys_info_is_done(unsigned long si_mask)
>+{
>+      return (sys_info_done_mask & si_mask) == si_mask;
>+}
>+
> void sys_info(unsigned long si_mask)
> {
>-      __sys_info(si_mask ? : kernel_si_mask);
>+      unsigned long mask;
>+
>+      if (si_mask)
>+              mask = si_mask & ~sys_info_done_mask;
>+      else
>+              mask = kernel_si_mask & ~sys_info_done_mask;
>+
>+      if (!mask)
>+              return;
>+
>+      __sys_info(mask);
>+      sys_info_done(mask);
> }
>

Thanks!

Reply via email to