Hello Tejun, On (11/07/17 05:23), Tejun Heo wrote: > Hello, Sergey. > > On Tue, Nov 07, 2017 at 11:04:34AM +0900, Sergey Senozhatsky wrote: > > just to make sure. there is a typo in Steven's patch: > > > > while (!READ_ONCE(console_waiter)) > > > > should be > > > > while (READ_ONCE(console_waiter)) > > > > is this the "tweaking" you are talking about? > > Oh, I was talking about tweaking the repro, but I'm not sure the above > would change anything. The problem that the repro demonstrates is a > message deluge involving an non-sleepable flusher + local irq (or > other atomic contexts) message producer. > > In the above case, none of the involved contexts can serve as the > flusher for a long time without messing up the system. If you wanna > allow printks to be async without falling into these lockups, you > gotta introduce an independent safe context to flush from.
we are in agreement. I Cc-ed you to another thread, let's merge discussions. > > > > there are some concerns, like a huge number of printk-s happening while > > > > console_sem is locked. e.g. console_lock()/console_unlock() on one of > > > > the > > > > CPUs, or console_lock(); printk(); ... printk(); console_unlock(); > > > > > > Unless we make all messages fully synchronous, I don't think there's a > > > good solution for that and I don't think we wanna make everything > > > fully synchronous. > > > > this is where it becomes complicated. offloading logic is not binary, > > unfortunately. we normally want to offload; but not always. things > > like sysrq or late PM warnings, or kexec, etc. want to stay fully sync, > > regardless the consequences. some of sysrq prints out even do > > touch_nmi_watchdog() and touch_all_softlockup_watchdogs(). current > > printk-kthread patch set tries to consider those cases and to avoid > > any offloading. > > Yeah, sure, selectively opting out of asynchronous operation is a > different (solvable) issue. Also, just to be clear, the proposed > patch doesn't make any of these worse in any meaningful way - e.g. we > could end up trapping a nice 20 task pinned to an overloaded CPU in > the flusher role. > > The following is a completely untested patch to show how we can put > the console in full sync mode, just the general idea. I'm a bit > skeptical we really wanna do this given that we already (with or > without the patch) stay sync for most of these events due to the way > we go async, but, yeah, if we wanna do that, we can do that. we've been going in a slightly different direction in printk-kthread. we keep printk sync by default [as opposed to previous "immediately offload" approach]. people asked for it, some people demanded it. we offload to printk-kthread only when we detect that this particular task on this particular CPU has been doing printing (without rescheduling) for 1/2 of watchdog threshold value. IOW, if we see that we are heading towards the lockup limit then we offload. otherwise - we let it loop in console_unlock(). -ss