On 06/01/2021 05:35, Sergey Senozhatsky wrote:
> On (21/01/04 16:15), “William Roche wrote:
> [..]
>> diff --git a/kernel/panic.c b/kernel/panic.c
>> index 332736a..eb90cc0 100644
>> --- a/kernel/panic.c
>> +++ b/kernel/panic.c
>> @@ -166,6 +166,15 @@ static void panic_print_sys_info(void)
>>                ftrace_dump(DUMP_ALL);
>>   }
>>
>> +/*
>> + * Force flush messages to the console.
>> + */
>> +static void panic_flush_to_console(void)
>> +{
>> +      printk_safe_flush_on_panic();
>> +      console_flush_on_panic(CONSOLE_FLUSH_PENDING);
>> +}
>
> You must debug_locks_off() as the very first step here. But see below.

Thanks I've missed this point, I will call debug_locks_off() before
the first call to console_flush_on_panic(CONSOLE_FLUSH_PENDING);

>
>>   /**
>>    *   panic - halt the system
>>    *   @fmt: The text string to print
>> @@ -247,7 +256,7 @@ void panic(const char *fmt, ...)
>>         * Bypass the panic_cpu check and call __crash_kexec directly.
>>         */
>>        if (!_crash_kexec_post_notifiers) {
>> -              printk_safe_flush_on_panic();
>> +              panic_flush_to_console();
>>                __crash_kexec(NULL);
>
> It's dangerous to call console_flush_on_panic() before we stop secondary
> CPUs. console_flush_on_panic() ignores the state console_sem, so if any
> of the secondary is currently printing something on the consoles you'll
> get corrupted messages - we use `static char buffer` for messages we
> push to consoles.

I understand that there is a risk here, and the __crash_exec() call is
not supposed to return when crashexec is ready. This is our last chance
here to push to the console the pending messages.

> Another issue is that with this panic_flush_to_console() call console_sem
> can end up being locked once (by secondary CPU) and unlocked twice (by
> second and panic CPUs) [*]

I agree, this is a risk. But what we see is that even the message
previously submitted by the pr_emerg() call of the panic thread is
currently discarded when __crash_kexec(NULL) is called.
In the case described here, no other message is printed out by a
secondary CPU, but I understand that it could happen, and we
could avoid calling panic_flush_to_console() when kexec_crash isn't
loaded so that we can wait for the SMP stop and leave the current
steps to run untouched until the last printed messages.

>
>>                /*
>> @@ -271,9 +280,8 @@ void panic(const char *fmt, ...)
>>         */
>>        atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
>>
>> -      /* Call flush even twice. It tries harder with a single online CPU */
>> -      printk_safe_flush_on_panic();
>>        kmsg_dump(KMSG_DUMP_PANIC);
>> +      panic_flush_to_console();
>
> Why?

Here, we are supposed to push the pending messages, as I could verify
that they don't reach the console until the console_unblank(). So I
wanted to push them with panic_flush_to_console() before the possible
upcoming __crash_kexec() call.

Are you suggesting that I should leave the printk_safe_flush_on_panic()
call before the kmsg_dump(KMSG_DUMP_PANIC) call ?
But let's leave that untouched too.

>
>>        /*
>>         * If you doubt kdump always works fine in any situation,
>> @@ -298,7 +306,7 @@ void panic(const char *fmt, ...)
>>         * buffer.  Try to acquire the lock then release it regardless of the
>>         * result.  The release will also print the buffers out.  Locks debug
>>         * should be disabled to avoid reporting bad unlock balance when
>> -       * panic() is not being callled from OOPS.
>> +       * panic() is not being called from OOPS.
>>         */
>>        debug_locks_off();
>>        console_flush_on_panic(CONSOLE_FLUSH_PENDING);
>> @@ -314,6 +322,7 @@ void panic(const char *fmt, ...)
>>                 * We can't use the "normal" timers since we just panicked.
>>                 */
>>                pr_emerg("Rebooting in %d seconds..\n", panic_timeout);
>> +              panic_flush_to_console();
>
> [*] So this
>
>>                for (i = 0; i < panic_timeout * 1000; i += PANIC_TIMER_STEP) {
>>                        touch_nmi_watchdog();
>> @@ -347,6 +356,7 @@ void panic(const char *fmt, ...)
>>        disabled_wait();
>>   #endif
>>        pr_emerg("---[ end Kernel panic - not syncing: %s ]---\n", buf);
>> +      panic_flush_to_console();
>
> [*] and this are both very interesting points.
>
> Those extra console_flush_on_panic() calls indicate that normal printk()
> cannot succeed in locking the console_sem so it doesn't try to
> console_unlock(): either because we killed the secondary CPU while it
> was holding the lock, or because we locked it once and unlocked it twice.
>
> I think it would make sense to just re-init console_sem, so that normal
> printk()-s will have chance to grab the console_sem lock and then we don't
> need any extra panic_flush_to_console() calls. Maybe we can do something
> like this
>
> ---
>
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index ffdd0dc7ec6d..4bd2e29c8cc0 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2638,6 +2638,7 @@ void console_flush_on_panic(enum con_flush_mode mode)
>     * context and we don't want to get preempted while flushing,
>     * ensure may_schedule is cleared.
>     */
> +  sema_init(&console_sem, 1);
>    console_trylock();
>    console_may_schedule = 0;
>
>

I've tried your suggestion, and even with this sema_init() call, the
 pr_emerg("Rebooting in %d seconds..\n", panic_timeout);
message is not printed to the console without a subsequent call to
panic_flush_to_console().

See below a code modification that would illustrate a possible change
including the above comments. Would something like that be more
appropriate ?

This is the end of this running kernel, and panic is called from the NMI
context where printk() messages have to be pushed. The mechanism I use
seems to work in this case, but there could be another way to achieve
this goal (?)

Thanks again for your help,
William.


--- kernel/panic.c      2021-01-07 18:20:01.992582942 -0500
***************
*** 166,171 ****
--- 166,180 ----
                ftrace_dump(DUMP_ALL);
  }

+ /*
+  * Force flush messages to the console.
+  */
+ static void panic_flush_to_console(void)
+ {
+       printk_safe_flush_on_panic();
+       console_flush_on_panic(CONSOLE_FLUSH_PENDING);
+ }
+
  /**
   *    panic - halt the system
   *    @fmt: The text string to print
***************
*** 247,253 ****
         * Bypass the panic_cpu check and call __crash_kexec directly.
         */
        if (!_crash_kexec_post_notifiers) {
!               printk_safe_flush_on_panic();
                __crash_kexec(NULL);

                /*
--- 256,265 ----
         * Bypass the panic_cpu check and call __crash_kexec directly.
         */
        if (!_crash_kexec_post_notifiers) {
!               if (kexec_crash_loaded()) {
!                       debug_locks_off();
!                       panic_flush_to_console();
!               }
                __crash_kexec(NULL);

                /*
***************
*** 284,291 ****
         *
         * Bypass the panic_cpu check and call __crash_kexec directly.
         */
!       if (_crash_kexec_post_notifiers)
                __crash_kexec(NULL);

  #ifdef CONFIG_VT
        unblank_screen();
--- 296,308 ----
         *
         * Bypass the panic_cpu check and call __crash_kexec directly.
         */
!       if (_crash_kexec_post_notifiers) {
!               if (kexec_crash_loaded()) {
!                       debug_locks_off();
!                       panic_flush_to_console();
!               }
                __crash_kexec(NULL);
+       }

  #ifdef CONFIG_VT
        unblank_screen();
***************
*** 298,304 ****
         * buffer.  Try to acquire the lock then release it regardless of the
         * result.  The release will also print the buffers out.  Locks debug
         * should be disabled to avoid reporting bad unlock balance when
!        * panic() is not being callled from OOPS.
         */
        debug_locks_off();
        console_flush_on_panic(CONSOLE_FLUSH_PENDING);
--- 315,321 ----
         * buffer.  Try to acquire the lock then release it regardless of the
         * result.  The release will also print the buffers out.  Locks debug
         * should be disabled to avoid reporting bad unlock balance when
!        * panic() is not being called from OOPS.
         */
        debug_locks_off();
        console_flush_on_panic(CONSOLE_FLUSH_PENDING);
***************
*** 314,319 ****
--- 331,337 ----
                 * We can't use the "normal" timers since we just panicked.
                 */
                pr_emerg("Rebooting in %d seconds..\n", panic_timeout);
+               panic_flush_to_console();

                for (i = 0; i < panic_timeout * 1000; i += PANIC_TIMER_STEP) {
                        touch_nmi_watchdog();
***************
*** 347,352 ****
--- 365,371 ----
        disabled_wait();
  #endif
        pr_emerg("---[ end Kernel panic - not syncing: %s ]---\n", buf);
+       panic_flush_to_console();

        /* Do not scroll important messages printed above */
        suppress_printk = 1;

Reply via email to