On June 26, 2026 3:26:11 PM GMT+01:00, Petr Mladek <[email protected]> wrote: >On Fri 2026-06-26 13:32:38, Bradley Morgan wrote: >> On June 26, 2026 1:17:13 PM GMT+01:00, Bradley Morgan ><[email protected]> >> wrote: >> >On June 26, 2026 1:14:14 PM GMT+01:00, Petr Mladek <[email protected]> >> >wrote: >> >>On Fri 2026-06-26 12:23:50, Petr Mladek wrote: >> >>> On Thu 2026-06-25 15:25:58, Bradley Morgan wrote: >> >>> But it all becomes very hairy. We have several levels: >> >>> >> >>> + watchdog-all_bt-specific option, e.g. >> >>sysctl_hardlockup_all_cpu_backtrace >> >>> >> >>> + watchdog-specific si_info preferences, e.g. hardlockup_si_mask >> >>> >> >>> + panic-specific si_info: panic_print >> >>> >> >>> + universal fallback for any layer: kernel_si_info >> >>> >> >>> Now, we try to check all these variables back and forth to >> >>> trigger all backtraces or to avoid triggering them. >> >>> And it clearly does not work well and the code is more and more >> >>> hairy. >> >>> >> >>> I think about another approach. The word "waterfall" comes to my >mind. >> >>> Instead of checking all the settings back and forth, let's process >> >>> each setting one by one and just remember what has been done and >> >>> skip this in the next level. >> >>> >> >>> All the si_info actions seems to dump a global system state. >> >>> So, it would make sense to remember the state in a global variable >> >>> even when it might be modified by more CPUs in parallel. >> >>> >> Hmm.. new idea >> >> kernel/dump_filter.c ? >> >> What this file could do is to handle a generic lockup state machine >> so any subsystem can log what it already dumped? >> >> I know it may bloat, but it's better then cramming fixes in. > >I am not sure what exactly you would like to achieve but it sounds >a bit scary ;-) > >Anyway, we should not synchronize the watchdog reports against >each other, definitely. They are running in non-compatible contexts >(task vs interrupt vs NMI). Also we should not add any locking >because they usually print something when the system has enough >troubles. > >Also I think that it is not worth preventing duplicated backtraces >or reports from a single CPU. IMHO, it is not a big problem >in practice. > >So, we are down to large reports, like backtraces from all CPUs, >timers, locks, ... which are handled by sys_info(). So, I think >that it should be enough to handle this inside the sys_info() API. > >I do not want to say that my proposal was the best solution. >I am sure that there are better ones. But we need to consider >the gain vs. complexity. > >Honestly, I am already a bit scared by the complexity which >we the sys_info() API added. And it is hard to imagine that >adding another API would make it easier. But I might be wrong. > >Instead, it might make sense to integrate the conflicting >subsystem-specific calls under the sys_info() API. >I mean that, for example watchdog_hardlockup_check() won't >call trigger_allbutcpu_cpu_backtrace() directly but >it would call it via sys_info() API so that sys_info() >could keep track of it. Something like: > >void sys_info_allbutcpu_bt(int cpu) >{ > trigger_allbutcpu_cpu_backtrace(cpu); > /* > * The caller likely printed backtrace of the given @cpu > * on its own. Prevent duplicate backtraces from all > * CPUs with potential next sys_info() call. > */ > sys_info_done(SYS_INFO_ALL_BT); >} > >But I am not sure if it is really easier to follow >than calling sys_info_done() from the watchdog code. > >Some watchdogs try to optimize the output and print backtraces >only from CPUs which are relevant for the given lockup. >We should keep the logic for selecting the set of CPUs >in the watchdog code. We just need to solve how to elegantly >make sys_info() aware of it or at least about the more massive >reports. > >Anyway, I would prefer to keep it simple until we see some problems >in practice. > >Best Regards, >Petr >
I understand it's scary. To make a new file in the first place. But I was a bit vague of what I wanted, and I'm sorry. So, the reason why I'd suggest a new file, is because if any subsystem Theoretically bypasses sys_info to log a lockup, this completely misses the filter and duplicates the dump My file would act as a generic lockless state machine that any subsystem can update regardless of how they dump logs. If you have any questions, feel absolutely free to ask! :) Discussion is a way to make everyone happy! Thanks!
