On Fri 2018-06-01 13:40:50, Sergey Senozhatsky wrote: > On (05/31/18 14:21), Petr Mladek wrote: > > > > > > Upstream printk has no printing kthread. And we also run > > > printk()->console_unlock() with disabled preemption. > > > > Yes, the comment was wrong > > Yes, that was the only thing I meant. > I really didn't have any time to look at the patch yesterday, just > commented on the most obvious thing.
Fair enough. > > but the problem is real. > > Yep, could be. But not exactly the way it is described in the commit > messages and the patch does not fully address the problem. > > The patch assumes that all those events happen sequentially. While > in reality they can happen in parallel on different CPUs. > > Example: > > CPU0 CPU1 > > set console verbose > > dump_backtrace() > { > // for (;;) print frames > printk("%pS\n", frame0); > printk("%pS\n", frame1); > printk("%pS\n", frame2); > printk("%pS\n", frame3); > ... console_loglevel = > CONSOLE_LOGLEVEL_SILENT; > printk("%pS\n", frame12); > printk("%pS\n", frame13); > } > > Part of backtrace or the entire backtrace will be missed, because > we read the global console_loglevel. The problem is still there. [...] > So I'd say that most likely the following scenarios can suffer: > > - NMI comes in, sets loglevel to X, printk-s some data, restores the > loglevel back to Y > - IRQ comes in [like sysrq, etc] comes in and does the same thing > - software exception comes in and does the same thing [e.g. bust_spinlocks() > at arch/s390/mm/fault.c] My view is: The race with another printk() (console_lock owner) is much more likely than a race between two CPUs manipulating console_loglevel. The proposed patch seems to be in the right direction. It is supposed to fix the most likely scenario. We could block it and request full solution but I wonder if it is worth it. I am personally fine with this partial solution for now. We could always make it better if people meet the other scenarios. Best Regards, Petr