On (20/06/03 14:19), Cheng Jian wrote: > A deadlock caused by logbuf_lock occurs when panic: > > a) Panic CPU is running in non-NMI context > b) Panic CPU sends out shutdown IPI via NMI vector > c) One of the CPUs that we bring down via NMI vector holded logbuf_lock > d) Panic CPU try to hold logbuf_lock, then deadlock occurs. > > we try to re-init the logbuf_lock in printk_safe_flush_on_panic() > to avoid deadlock, but it does not work here, because : > > Firstly, it is inappropriate to check num_online_cpus() here. > When the CPU bring down via NMI vector, the panic CPU willn't > wait too long for other cores to stop, so when this problem > occurs, num_online_cpus() may be greater than 1. > > Secondly, printk_safe_flush_on_panic() is called after panic > notifier callback, so if printk() is called in panic notifier > callback, deadlock will still occurs. Eg, if ftrace_dump_on_oops > is set, we print some debug information, it will try to hold the > logbuf_lock. > > To avoid this deadlock, drop the num_online_cpus() check and call > the printk_safe_flush_on_panic() before panic_notifier_list callback, > attempt to re-init logbuf_lock from panic CPU.
We hopefully will get rid of some of these locks (around 5.9 kernel maybe), so the deadlocks (at least in the printk-code) should become less common. -ss