On June 26, 2026 1:17:13 PM GMT+01:00, Bradley Morgan <[email protected]>
wrote:
>On June 26, 2026 1:14:14 PM GMT+01:00, Petr Mladek <[email protected]>
>wrote:
>>On Fri 2026-06-26 12:23:50, Petr Mladek wrote:
>>> On Thu 2026-06-25 15:25:58, Bradley Morgan wrote:
>>> > panic_other_cpus_shutdown() handles SYS_INFO_ALL_BT before stopping
>>the
>>> > other CPUs. Do not ask sys_info() to handle that bit again later in
>>the
>>> > panic path.
>>> > 
>>> > Use sys_info_with_filter() so panic_print=all_bt does not request
>more
>>> > output after the CPUs are stopped.
>>> > 
>>> > Fixes: a9af76a78760 ("watchdog: add sys_info sysctls to dump sys info
>>on system lockup")
>>> > Cc: [email protected]
>>> > Signed-off-by: Bradley Morgan <[email protected]>
>>> > ---
>>> >  kernel/panic.c | 2 +-
>>> >  1 file changed, 1 insertion(+), 1 deletion(-)
>>> > 
>>> > diff --git a/kernel/panic.c b/kernel/panic.c
>>> > index 213725b612aa..eb842823df61 100644
>>> > --- a/kernel/panic.c
>>> > +++ b/kernel/panic.c
>>> > @@ -680,7 +680,7 @@ void vpanic(const char *fmt, va_list args)
>>> >    */
>>> >   atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
>>> >  
>>> > - sys_info(panic_print);
>>> > + sys_info_with_filter(panic_print, SYS_INFO_ALL_BT);
>>> 
>>> Hmm, this prevents printing backtraces from all CPUs completely.
>>> But what if they were not printed?
>>> 
>>> They might be printed by:
>>> 
>>> static void panic_other_cpus_shutdown(bool crash_kexec)
>>> {
>>>     if (panic_print & SYS_INFO_ALL_BT)
>>>             panic_trigger_all_cpu_backtrace();
>>> 
>>> [...]
>>> }
>>> 
>>> But it checks only "panic_print" variable. It won't do anything
>>> when (panic_print == 0).
>>> 
>>> In this case, we might still want to print the backraces when
>>> SYS_INFO_ALL_BT is set in kernel_si_info.
>>> 
>>> >   kmsg_dump_desc(KMSG_DUMP_PANIC, buf);
>>> 
>>> Of course, we might fix panic_other_cpus_shutdown() to check also
>>> kernel_si_info.
>>> 
>>> But it all becomes very hairy. We have several levels:
>>> 
>>>    + watchdog-all_bt-specific option, e.g.
>>sysctl_hardlockup_all_cpu_backtrace
>>> 
>>>    + watchdog-specific si_info preferences, e.g. hardlockup_si_mask
>>> 
>>>    + panic-specific si_info: panic_print
>>> 
>>>    + universal fallback for any layer: kernel_si_info
>>> 
>>> Now, we try to check all these variables back and forth to
>>> trigger all backtraces or to avoid triggering them.
>>> And it clearly does not work well and the code is more and more
>>> hairy.
>>> 
>>> I think about another approach. The word "waterfall" comes to my mind.
>>> Instead of checking all the settings back and forth, let's process
>>> each setting one by one and just remember what has been done and
>>> skip this in the next level.
>>> 
>>> All the si_info actions seems to dump a global system state.
>>> So, it would make sense to remember the state in a global variable
>>> even when it might be modified by more CPUs in parallel.
>>> 
>>> I am going to think more about it.
>>
>>I have created a POC using Gemini. I haven't tested it.
>>But it looks acceptable. And the logic seems to be more
>>straightforward.
>>
>>One drawback is that it requires adding the _reset()
>>call for all sys_info() callers. It is fine in principle
>>but it might complicate back-porting because all changes
>>have to be done in one patch.
>>
>>But honestly, this is a nice to have fix. Most people could
>>live happily without it.
>>
>>From 3c66436d9978030845a96bfaedd6b914536e2ac4 Mon Sep 17 00:00:00 2001
>>From: Petr Mladek <[email protected]>
>>Date: Fri, 26 Jun 2026 13:55:41 +0200
>>Subject: [POC] sys_info: Introduce state-tracking APIs to prevent
>duplicate
>> backtraces
>>
>>In watchdog, panic, and hung task detection scenarios, sys_info() can
>>be called multiple times or alongside direct backtrace triggers like
>>trigger_allbutcpu_cpu_backtrace(). This results in identical backtraces
>>being dumped repeatedly from all CPUs, cluttering the kernel log and
>>delaying or obscuring critical debug details.
>>
>>Introduce a state tracking bitmask and associated helpers:
>>- sys_info_done(mask): Marks specific sys_info bits as already printed.
>>- sys_info_reset(): Resets the tracking state.
>>- sys_info_is_done(mask): Checks if all bits in the mask have been
>printed.
>>
>>Update sys_info() to automatically filter out already printed bits
>>using this state. Integrate these APIs with the generic hardlockup
>>and softlockup watchdogs, the PowerPC watchdog, the hung task detector,
>>and the panic core. This ensures that each piece of system information
>>and backtrace output is printed at most once per lockup/panic event,
>>and the state is reset cleanly when a lockup does not trigger a panic.
>>
>>Races between sys_info() callers are ignored. It should be acceptable
>>because the output from various watchdogs has never been synchronized.
>>And panic() never returns.
>>
>>Assisted-by: gemini-1.5-flash ?
>
>Why not use gemini 3.5 flash?
>
>I can try if you want. 
>
>Could I have the prompt you used? :)
>
>>Signed-off-by: Petr Mladek <[email protected]>
>>---
>> arch/powerpc/kernel/watchdog.c | 13 ++++++++++---
>> include/linux/sys_info.h       |  3 +++
>> kernel/hung_task.c             |  2 ++
>> kernel/panic.c                 |  4 +++-
>> kernel/watchdog.c              | 10 ++++++++--
>> lib/sys_info.c                 | 30 +++++++++++++++++++++++++++++-
>> 6 files changed, 55 insertions(+), 7 deletions(-)
>>
>>diff --git a/arch/powerpc/kernel/watchdog.c
>b/arch/powerpc/kernel/watchdog.c
>>index c40c69368476..0eab7894b9dc 100644
>>--- a/arch/powerpc/kernel/watchdog.c
>>+++ b/arch/powerpc/kernel/watchdog.c
>>@@ -239,6 +239,7 @@ static void watchdog_smp_panic(int cpu)
>>      if (sysctl_hardlockup_all_cpu_backtrace ||
>>          (hardlockup_si_mask & SYS_INFO_ALL_BT)) {
>>              trigger_allbutcpu_cpu_backtrace(cpu);
>>+             sys_info_done(SYS_INFO_ALL_BT);
>>              cpumask_clear(&wd_smp_cpus_ipi);
>>      } else {
>>              /*
>>@@ -251,10 +252,12 @@ static void watchdog_smp_panic(int cpu)
>>              }
>>      }
>> 
>>-     sys_info(hardlockup_si_mask & ~SYS_INFO_ALL_BT);
>>+     sys_info(hardlockup_si_mask);
>>      if (hardlockup_panic)
>>              nmi_panic(NULL, "Hard LOCKUP");
>> 
>>+     sys_info_reset();
>>+
>>      wd_end_reporting();
>> 
>>      return;
>>@@ -419,13 +422,17 @@ DEFINE_INTERRUPT_HANDLER_NMI(soft_nmi_interrupt)
>>              xchg(&__wd_nmi_output, 1); // see wd_lockup_ipi
>> 
>>              if (sysctl_hardlockup_all_cpu_backtrace ||
>>-                 (hardlockup_si_mask & SYS_INFO_ALL_BT))
>>+                 (hardlockup_si_mask & SYS_INFO_ALL_BT)) {
>>                      trigger_allbutcpu_cpu_backtrace(cpu);
>>+                     sys_info_done(SYS_INFO_ALL_BT);
>>+             }
>> 
>>-             sys_info(hardlockup_si_mask & ~SYS_INFO_ALL_BT);
>>+             sys_info(hardlockup_si_mask);
>>              if (hardlockup_panic)
>>                      nmi_panic(regs, "Hard LOCKUP");
>> 
>>+             sys_info_reset();
>>+
>>              wd_end_reporting();
>>      }
>>      /*
>>diff --git a/include/linux/sys_info.h b/include/linux/sys_info.h
>>index a5bc3ea3d44b..ad43548c75dd 100644
>>--- a/include/linux/sys_info.h
>>+++ b/include/linux/sys_info.h
>>@@ -18,6 +18,9 @@
>> #define SYS_INFO_BLOCKED_TASKS               0x00000080
>> 
>> void sys_info(unsigned long si_mask);
>>+void sys_info_done(unsigned long si_mask);
>>+void sys_info_reset(void);
>>+bool sys_info_is_done(unsigned long si_mask);
>> unsigned long sys_info_parse_param(char *str);
>> 
>> #ifdef CONFIG_SYSCTL
>>diff --git a/kernel/hung_task.c b/kernel/hung_task.c
>>index 6fcc94ce4ca9..dbb6a27770f5 100644
>>--- a/kernel/hung_task.c
>>+++ b/kernel/hung_task.c
>>@@ -354,6 +354,8 @@ static void check_hung_uninterruptible_tasks(unsigned
>long timeout)
>> 
>>      if (hung_task_call_panic)
>>              panic("hung_task: blocked tasks");
>>+
>>+     sys_info_reset();
>> }
>> 
>> static long hung_timeout_jiffies(unsigned long last_checked,
>>diff --git a/kernel/panic.c b/kernel/panic.c
>>index 213725b612aa..86ce17f03da2 100644
>>--- a/kernel/panic.c
>>+++ b/kernel/panic.c
>>@@ -550,8 +550,10 @@ static void panic_trigger_all_cpu_backtrace(void)
>>  */
>> static void panic_other_cpus_shutdown(bool crash_kexec)
>> {
>>-     if (panic_print & SYS_INFO_ALL_BT)
>>+     if ((panic_print & SYS_INFO_ALL_BT) && 
>>!sys_info_is_done(SYS_INFO_ALL_BT)) {
>>              panic_trigger_all_cpu_backtrace();
>>+             sys_info_done(SYS_INFO_ALL_BT);
>>+     }
>> 
>>      /*
>>       * Note that smp_send_stop() is the usual SMP shutdown function,
>>diff --git a/kernel/watchdog.c b/kernel/watchdog.c
>>index 87dd5e0f6968..f431087c68a7 100644
>>--- a/kernel/watchdog.c
>>+++ b/kernel/watchdog.c
>>@@ -282,14 +282,17 @@ void watchdog_hardlockup_check(unsigned int cpu,
>struct pt_regs *regs)
>> 
>>      if (hardlockup_all_cpu_backtrace) {
>>              trigger_allbutcpu_cpu_backtrace(cpu);
>>+             sys_info_done(SYS_INFO_ALL_BT);
>>              if (!hardlockup_panic)
>>                      clear_bit_unlock(0, &hard_lockup_nmi_warn);
>>      }
>> 
>>-     sys_info(hardlockup_si_mask & ~SYS_INFO_ALL_BT);
>>+     sys_info(hardlockup_si_mask);
>>      if (hardlockup_panic)
>>              nmi_panic(regs, "Hard LOCKUP");
>> 
>>+     sys_info_reset();
>>+
>>      per_cpu(watchdog_hardlockup_warned, cpu) = true;
>> }
>> 
>>@@ -895,16 +898,19 @@ static enum hrtimer_restart
>watchdog_timer_fn(struct hrtimer *hrtimer)
>> 
>>              if (softlockup_all_cpu_backtrace) {
>>                      trigger_allbutcpu_cpu_backtrace(smp_processor_id());
>>+                     sys_info_done(SYS_INFO_ALL_BT);
>>                      if (!softlockup_panic)
>>                              clear_bit_unlock(0, &soft_lockup_nmi_warn);
>>              }
>> 
>>              add_taint(TAINT_SOFTLOCKUP, LOCKDEP_STILL_OK);
>>-             sys_info(softlockup_si_mask & ~SYS_INFO_ALL_BT);
>>+             sys_info(softlockup_si_mask);
>>              thresh_count = duration / get_softlockup_thresh();
>> 
>>              if (softlockup_panic && thresh_count >= softlockup_panic)
>>                      panic("softlockup: hung tasks");
>>+
>>+             sys_info_reset();
>>      }
>> 
>>      return HRTIMER_RESTART;
>>diff --git a/lib/sys_info.c b/lib/sys_info.c
>>index f32a06ec9ed4..f8e6176fae75 100644
>>--- a/lib/sys_info.c
>>+++ b/lib/sys_info.c
>>@@ -160,7 +160,35 @@ static void __sys_info(unsigned long si_mask)
>>              show_state_filter(TASK_UNINTERRUPTIBLE);
>> }
>> 
>>+static unsigned long sys_info_done_mask;
>>+
>>+void sys_info_done(unsigned long si_mask)
>>+{
>>+     sys_info_done_mask |= si_mask;
>>+}
>>+
>>+void sys_info_reset(void)
>>+{
>>+     sys_info_done_mask = 0;
>>+}
>>+
>>+bool sys_info_is_done(unsigned long si_mask)
>>+{
>>+     return (sys_info_done_mask & si_mask) == si_mask;
>>+}
>>+
>> void sys_info(unsigned long si_mask)
>> {
>>-     __sys_info(si_mask ? : kernel_si_mask);
>>+     unsigned long mask;
>>+
>>+     if (si_mask)
>>+             mask = si_mask & ~sys_info_done_mask;
>>+     else
>>+             mask = kernel_si_mask & ~sys_info_done_mask;
>>+
>>+     if (!mask)
>>+             return;
>>+
>>+     __sys_info(mask);
>>+     sys_info_done(mask);
>> }
>>
>
>Thanks!


Hmm.. new idea 

kernel/dump_filter.c ?

What this file could do is to handle a generic lockup state machine
so any subsystem can log what it already dumped?


I know it may bloat, but it's better then cramming fixes in.

What do you guys think? Maybe we could start a RFC for this?

Thanks!

Reply via email to