On (21/01/04 16:15), “William Roche wrote: [..] > diff --git a/kernel/panic.c b/kernel/panic.c > index 332736a..eb90cc0 100644 > --- a/kernel/panic.c > +++ b/kernel/panic.c > @@ -166,6 +166,15 @@ static void panic_print_sys_info(void) > ftrace_dump(DUMP_ALL); > } > > +/* > + * Force flush messages to the console. > + */ > +static void panic_flush_to_console(void) > +{ > + printk_safe_flush_on_panic(); > + console_flush_on_panic(CONSOLE_FLUSH_PENDING); > +}
You must debug_locks_off() as the very first step here. But see below. > /** > * panic - halt the system > * @fmt: The text string to print > @@ -247,7 +256,7 @@ void panic(const char *fmt, ...) > * Bypass the panic_cpu check and call __crash_kexec directly. > */ > if (!_crash_kexec_post_notifiers) { > - printk_safe_flush_on_panic(); > + panic_flush_to_console(); > __crash_kexec(NULL); It's dangerous to call console_flush_on_panic() before we stop secondary CPUs. console_flush_on_panic() ignores the state console_sem, so if any of the secondary is currently printing something on the consoles you'll get corrupted messages - we use `static char buffer` for messages we push to consoles. Another issue is that with this panic_flush_to_console() call console_sem can end up being locked once (by secondary CPU) and unlocked twice (by second and panic CPUs) [*] > /* > @@ -271,9 +280,8 @@ void panic(const char *fmt, ...) > */ > atomic_notifier_call_chain(&panic_notifier_list, 0, buf); > > - /* Call flush even twice. It tries harder with a single online CPU */ > - printk_safe_flush_on_panic(); > kmsg_dump(KMSG_DUMP_PANIC); > + panic_flush_to_console(); Why? > /* > * If you doubt kdump always works fine in any situation, > @@ -298,7 +306,7 @@ void panic(const char *fmt, ...) > * buffer. Try to acquire the lock then release it regardless of the > * result. The release will also print the buffers out. Locks debug > * should be disabled to avoid reporting bad unlock balance when > - * panic() is not being callled from OOPS. > + * panic() is not being called from OOPS. > */ > debug_locks_off(); > console_flush_on_panic(CONSOLE_FLUSH_PENDING); > @@ -314,6 +322,7 @@ void panic(const char *fmt, ...) > * We can't use the "normal" timers since we just panicked. > */ > pr_emerg("Rebooting in %d seconds..\n", panic_timeout); > + panic_flush_to_console(); [*] So this > for (i = 0; i < panic_timeout * 1000; i += PANIC_TIMER_STEP) { > touch_nmi_watchdog(); > @@ -347,6 +356,7 @@ void panic(const char *fmt, ...) > disabled_wait(); > #endif > pr_emerg("---[ end Kernel panic - not syncing: %s ]---\n", buf); > + panic_flush_to_console(); [*] and this are both very interesting points. Those extra console_flush_on_panic() calls indicate that normal printk() cannot succeed in locking the console_sem so it doesn't try to console_unlock(): either because we killed the secondary CPU while it was holding the lock, or because we locked it once and unlocked it twice. I think it would make sense to just re-init console_sem, so that normal printk()-s will have chance to grab the console_sem lock and then we don't need any extra panic_flush_to_console() calls. Maybe we can do something like this --- diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index ffdd0dc7ec6d..4bd2e29c8cc0 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -2638,6 +2638,7 @@ void console_flush_on_panic(enum con_flush_mode mode) * context and we don't want to get preempted while flushing, * ensure may_schedule is cleared. */ + sema_init(&console_sem, 1); console_trylock(); console_may_schedule = 0;