On (21/01/04 16:15), “William Roche wrote:
[..]
> diff --git a/kernel/panic.c b/kernel/panic.c
> index 332736a..eb90cc0 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -166,6 +166,15 @@ static void panic_print_sys_info(void)
>               ftrace_dump(DUMP_ALL);
>  }
>  
> +/*
> + * Force flush messages to the console.
> + */
> +static void panic_flush_to_console(void)
> +{
> +     printk_safe_flush_on_panic();
> +     console_flush_on_panic(CONSOLE_FLUSH_PENDING);
> +}

You must debug_locks_off() as the very first step here. But see below.

>  /**
>   *   panic - halt the system
>   *   @fmt: The text string to print
> @@ -247,7 +256,7 @@ void panic(const char *fmt, ...)
>        * Bypass the panic_cpu check and call __crash_kexec directly.
>        */
>       if (!_crash_kexec_post_notifiers) {
> -             printk_safe_flush_on_panic();
> +             panic_flush_to_console();
>               __crash_kexec(NULL);

It's dangerous to call console_flush_on_panic() before we stop secondary
CPUs. console_flush_on_panic() ignores the state console_sem, so if any
of the secondary is currently printing something on the consoles you'll
get corrupted messages - we use `static char buffer` for messages we
push to consoles.

Another issue is that with this panic_flush_to_console() call console_sem
can end up being locked once (by secondary CPU) and unlocked twice (by
second and panic CPUs) [*]

>               /*
> @@ -271,9 +280,8 @@ void panic(const char *fmt, ...)
>        */
>       atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
>  
> -     /* Call flush even twice. It tries harder with a single online CPU */
> -     printk_safe_flush_on_panic();
>       kmsg_dump(KMSG_DUMP_PANIC);
> +     panic_flush_to_console();

Why?

>       /*
>        * If you doubt kdump always works fine in any situation,
> @@ -298,7 +306,7 @@ void panic(const char *fmt, ...)
>        * buffer.  Try to acquire the lock then release it regardless of the
>        * result.  The release will also print the buffers out.  Locks debug
>        * should be disabled to avoid reporting bad unlock balance when
> -      * panic() is not being callled from OOPS.
> +      * panic() is not being called from OOPS.
>        */
>       debug_locks_off();
>       console_flush_on_panic(CONSOLE_FLUSH_PENDING);
> @@ -314,6 +322,7 @@ void panic(const char *fmt, ...)
>                * We can't use the "normal" timers since we just panicked.
>                */
>               pr_emerg("Rebooting in %d seconds..\n", panic_timeout);
> +             panic_flush_to_console();

[*] So this

>               for (i = 0; i < panic_timeout * 1000; i += PANIC_TIMER_STEP) {
>                       touch_nmi_watchdog();
> @@ -347,6 +356,7 @@ void panic(const char *fmt, ...)
>       disabled_wait();
>  #endif
>       pr_emerg("---[ end Kernel panic - not syncing: %s ]---\n", buf);
> +     panic_flush_to_console();

[*] and this are both very interesting points.

Those extra console_flush_on_panic() calls indicate that normal printk()
cannot succeed in locking the console_sem so it doesn't try to
console_unlock(): either because we killed the secondary CPU while it
was holding the lock, or because we locked it once and unlocked it twice.

I think it would make sense to just re-init console_sem, so that normal
printk()-s will have chance to grab the console_sem lock and then we don't
need any extra panic_flush_to_console() calls. Maybe we can do something
like this

---

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index ffdd0dc7ec6d..4bd2e29c8cc0 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2638,6 +2638,7 @@ void console_flush_on_panic(enum con_flush_mode mode)
         * context and we don't want to get preempted while flushing,
         * ensure may_schedule is cleared.
         */
+       sema_init(&console_sem, 1);
        console_trylock();
        console_may_schedule = 0;
 

Reply via email to