On (12/15/17 14:06), Sergey Senozhatsky wrote: [..] > > Where do we do the above? And has this been proven to be an issue? > > um... hundreds of cases. > > deep-stack spin_lock_irqsave() lockup reports from multiple CPUs (3 cpus) > happening at the same moment + NMI backtraces from all the CPUs (more > than 3 cpus) that follows the lockups, over not-so-fast serial console. > exactly the bug report I received two days ago. so which one of the CPUs > here is a good candidate to successfully emit all of the pending logbuf > entries? none. all of them either have local IRQs disabled, or dump_stack() > from either backtrace IPI or backtrace NMI (depending on the configuration).
and, Steven, one more thing. wondering what's your opinion. suppose we have consoe_owner hand off enabled, 1 non-atomic CPU doing printk-s and several atomic CPUs doing printk-s. Is proposed hand off scheme really useful in this case? CPUs will now a) print their lines (a potentially slow call_console_drivers()) and b) spin in vprintk_emit on console_owner with local IRQs disabled waiting for either non-atomic printk CPU or another atomic CPU to finish printing its line (call_console_drivers()) and to hand off printing. so current CPU, after busy-waiting for foreign CPU's call_console_drivers(), will go and do his own call_console_drivers(). which, time-wise, simply doubles (roughly) the amount of time that CPU spends in printk()->console_unlock(). agreed? if we previously could have a case when non-atomic printk CPU would grab the console_sem and print all atomic printk CPUs messages first, and then its own messages, thus atomic printk CPUs would have just log_store(), now we will have CPUs to call_console_driver() and to spin on console_sem owner waiting for call_console_driver() on a foreign CPU [not all of them: it's one CPU doing the print out and one CPU spinning console_owner. but overall I think all CPUs will experience that spin on console_sem waiting for call_console_driver() and then do its own call_console_driver()]. even two CPUs case is not so simple anymore. see below. - first, assume one CPU is atomic and one is non-atomic. - second, assume that both CPUs are atomic CPUs, and go thought it again. CPU0 CPU1 printk() printk() log_store() log_store() console_unlock() set console_owner sees console_owner sets console_waiter spin call_console_drivers() sees console_waiter break printk() log_store() console_unlock() set console_owner sees console_owner sets console_waiter spin call_console_drivers() sees console_waiter break printk() log_store() console_unlock() set console_owner sees console_owner sets console_waiter spin call_console_drivers() sees console_waiter break printk() log_store() console_unlock() set console_owner sees console_owner sets console_waiter spin .... that "wait for call_console_drivers() on another CPU and then do its own call_console_drivers()" pattern does look dangerous. the benefit of hand-off is really fragile sometimes, isn't it? -ss