Hello, On (07/13/16 08:39), Viresh Kumar wrote: [..] > Maybe not, as this can still lead to the original bug we were all > chasing. This may hog some other CPU if we are doing excessive > printing in suspend :(
excessive printing is just part of the problem here. if we cab cond_resched() in console_unlock() (IOW, we execute console_unlock() with preemption and interrupts enabled) then everything must be ok, and *from printing POV* there is no difference whether it's printk_kthread or anything else in this case. the difference jumps in when original console_unlock() is executed with preemption/irq disabled, then offloading it to schedulable printk_kthread is the right thing. > suspend_console() is called quite early, so for example in my case we > do lots of printing during suspend (not from the suspend thread, but > an IRQ handled by the USB subsystem, which removes a bus with help of > some other thread probably). a silly question -- can we suspend consoles later? part of suspend/hibernation is cpu_down(), which lands in console_cpu_notify(), that does synchronous printing for every CPU taken down: static int console_cpu_notify(struct notifier_block *self, unsigned long action, void *hcpu) { switch (action) { case CPU_ONLINE: case CPU_DEAD: case CPU_DOWN_FAILED: case CPU_UP_CANCELED: console_lock(); console_unlock(); ^^^^^^^^^^^^^^ } return NOTIFY_OK; } console_unlock() is synchronous (I posted a very early draft patch that makes it asynchronous, but that's a future work). so if there is a ton of printk()-s, then console_unlock() will print it, 100% guaranteed. even if printk_kthread is doing the printing job at the moment, cpu down path will wait for it to stop, lock the console semaphore, and got to console_unlock() printing loop. in printk that you have posted, that will happen not only for CPU_DEAD, but for CPU_DYING as well (possibly, there is a /* invoked with preemption disabled, so defer */ comment, so may be you never endup doing direct printk there, but then you schedule a console_unlock() work). > That is why my Hacky patch tried to do it after devices are removed > and irqs are disabled, but before syscore users are suspended (and > timekeeping is one of them). And so it fixes it for me completely. > > IOW, we should switch back to synchronous printing after disabling > interrupts on the last running CPU. > > And I of course agree with Rafael that we would need something similar > in Hibernation code path as well, if we choose to fix it my way. suspend/hibernation/kexec - all covered by this patch. -ss