On June 26, 2026 3:47:12 PM GMT+01:00, Petr Mladek <[email protected]>
wrote:
>On Fri 2026-06-26 15:35:19, Bradley Morgan wrote:
>> On June 26, 2026 3:26:11 PM GMT+01:00, Petr Mladek <[email protected]>
>> wrote:
>> >On Fri 2026-06-26 13:32:38, Bradley Morgan wrote:
>> >> On June 26, 2026 1:17:13 PM GMT+01:00, Bradley Morgan
>> ><[email protected]>
>> >> wrote:
>> >> >On June 26, 2026 1:14:14 PM GMT+01:00, Petr Mladek
><[email protected]>
>> >> >wrote:
>> >> >>On Fri 2026-06-26 12:23:50, Petr Mladek wrote:
>> >> >>> On Thu 2026-06-25 15:25:58, Bradley Morgan wrote:
>> >> >>> But it all becomes very hairy. We have several levels:
>> >> >>> 
>> >> >>>    + watchdog-all_bt-specific option, e.g.
>> >> >>sysctl_hardlockup_all_cpu_backtrace
>> >> >>> 
>> >> >>>    + watchdog-specific si_info preferences, e.g.
>hardlockup_si_mask
>> >> >>> 
>> >> >>>    + panic-specific si_info: panic_print
>> >> >>> 
>> >> >>>    + universal fallback for any layer: kernel_si_info
>> >> >>> 
>> >> >>> Now, we try to check all these variables back and forth to
>> >> >>> trigger all backtraces or to avoid triggering them.
>> >> >>> And it clearly does not work well and the code is more and more
>> >> >>> hairy.
>> >> >>> 
>> >> >>> I think about another approach. The word "waterfall" comes to my
>> >mind.
>> >> >>> Instead of checking all the settings back and forth, let's
>process
>> >> >>> each setting one by one and just remember what has been done and
>> >> >>> skip this in the next level.
>> >> >>> 
>> >> >>> All the si_info actions seems to dump a global system state.
>> >> >>> So, it would make sense to remember the state in a global
>variable
>> >> >>> even when it might be modified by more CPUs in parallel.
>> >> >>> 
>> >> Hmm.. new idea 
>> >> 
>> >> kernel/dump_filter.c ?
>> >> 
>> >> What this file could do is to handle a generic lockup state machine
>> >> so any subsystem can log what it already dumped?
>> >> 
>> >> I know it may bloat, but it's better then cramming fixes in.
>> >
>> >I am not sure what exactly you would like to achieve but it sounds
>> >a bit scary ;-)
>> >
>> >Anyway, we should not synchronize the watchdog reports against
>> >each other, definitely. They are running in non-compatible contexts
>> >(task vs interrupt vs NMI). Also we should not add any locking
>> >because they usually print something when the system has enough
>> >troubles.
>> >
>> >Also I think that it is not worth preventing duplicated backtraces
>> >or reports from a single CPU. IMHO, it is not a big problem
>> >in practice.
>> >
>> >So, we are down to large reports, like backtraces from all CPUs,
>> >timers, locks, ... which are handled by sys_info(). So, I think
>> >that it should be enough to handle this inside the sys_info() API.
>> >
>> >I do not want to say that my proposal was the best solution.
>> >I am sure that there are better ones. But we need to consider
>> >the gain vs. complexity.
>> >
>> >Honestly, I am already a bit scared by the complexity which
>> >we the sys_info() API added. And it is hard to imagine that
>> >adding another API would make it easier. But I might be wrong.
>> >
>> >Instead, it might make sense to integrate the conflicting
>> >subsystem-specific calls under the sys_info() API.
>> >I mean that, for example watchdog_hardlockup_check() won't
>> >call trigger_allbutcpu_cpu_backtrace() directly but
>> >it would call it via sys_info() API so that sys_info()
>> >could keep track of it. Something like:
>> >
>> >void sys_info_allbutcpu_bt(int cpu)
>> >{
>> >    trigger_allbutcpu_cpu_backtrace(cpu);
>> >    /*
>> >     * The caller likely printed backtrace of the given @cpu
>> >     * on its own. Prevent duplicate backtraces from all
>> >     * CPUs with potential next sys_info() call.
>> >     */
>> >    sys_info_done(SYS_INFO_ALL_BT);
>> >}
>> >
>> >But I am not sure if it is really easier to follow
>> >than calling sys_info_done() from the watchdog code.
>> >
>> >Some watchdogs try to optimize the output and print backtraces
>> >only from CPUs which are relevant for the given lockup.
>> >We should keep the logic for selecting the set of CPUs
>> >in the watchdog code. We just need to solve how to elegantly
>> >make sys_info() aware of it or at least about the more massive
>> >reports.
>> >
>> >Anyway, I would prefer to keep it simple until we see some problems
>> >in practice.
>> >
>> >Best Regards,
>> >Petr
>> >
>> 
>> 
>> I understand it's scary. To make a new file in the first place.
>> 
>> But I was a bit vague of what I wanted, and I'm sorry.
>> 
>> So, the reason why I'd suggest a new file, is because if any subsystem
>> Theoretically bypasses sys_info to log a lockup, this completely misses
>> the filter and duplicates the dump
>> 
>> My file would act as a generic lockless state machine that any
>> subsystem can update regardless of how they dump logs.
>> 
>> If you have any questions, feel absolutely free to ask! :)
>> 
>> Discussion is a way to make everyone happy!
>
>Honestly, I am more and more wondering whether your are a real person
>or AI bot.

Sigh..

I can verify myself through video call if you don't believe I am human :)

why I suggested a new file is because AI said it would be a good idea.

I told it what I should do, and it told me to do a new file.

I knew it was over engineering slightly, but I was a bit stressed, 
and I wanted some sort of just new API which is less buggy imho

I should've told you that I used AI to figure the whole new file idea, 


Really sorry petr..

>Best Regards,
>Petr
>

Thanks!

Reply via email to