Hello, Steven.

On Wed, Jan 17, 2018 at 12:12:51PM -0500, Steven Rostedt wrote:
> From what I gathered, you said an OOM would trigger, and then the
> network console would not be able to allocate memory and it would
> trigger a printk too, and cause an infinite amount of printks.

Yeah, it falls into back-and-forth loop between the OOM code and
netconsole path.

> This could very well be a great place to force offloading. If a printk
> is called from within a printk, at the same context (normal, softirq,
> irq or NMI), then we should trigger the offloading.

I was thinking more of a timeout based approach (ie. if stuck for
longer than X or X messages, offload), but if local feedback loop is
the only thing we're missing after your improvements, detecting that
specific condition definitely works and is likely a better approach in
terms of message delivery guarantee.

> +static void kick_offload_thread(void)
> +{
> +     /*
> +      * Consoles are triggering printks, offload the printks
> +      * to another CPU to hopefully avoid a lockup.
> +      */
> +}
...
> @@ -2333,6 +2390,7 @@ void console_unlock(void)
>  
>       for (;;) {
>               struct printk_log *msg;
> +             bool offload;
>               size_t ext_len = 0;
>               size_t len;
>  
> @@ -2393,15 +2451,20 @@ void console_unlock(void)
>                * waiter waiting to take over.
>                */
>               console_lock_spinning_enable();
> +             offload = recursion_check_start();
>  
>               stop_critical_timings();        /* don't trace print latency */
>               call_console_drivers(ext_text, ext_len, text, len);
>               start_critical_timings();
>  
> +             recursion_check_finish(offload);
> +
>               if (console_lock_spinning_disable_and_check()) {
>                       printk_safe_exit_irqrestore(flags);
>                       return;
>               }
> +             if (offload)
> +                     kick_offload_thread();

Yeah, something like this would definitely work.

Thanks a lot.

-- 
tejun

Reply via email to