On Fri 2017-03-24 10:59:36, Sergey Senozhatsky wrote: > On (03/23/17 09:51), Peter Zijlstra wrote: > [..] > > > > sysrq runs from interrupt context, right? Should be able to do wakeups. > > > > > > what I though about was - > > > what if there are 'misbehaving' higher prio tasks all the time? > > > the existing sysrq would attempt to do printing from irq context > > > so it doesn't care about run queues. > > > > > > does it make sense to you? > > > > Ah, that's what you meant. Yeah, dunno, I'm still unconvinced about the > > whole printk thread thing. > > I see your point. > but I can't think of alternatives that would fix all those lockups and > stalls and at the same time have better guarantees than printk_kthread. > > > > Also those function names are horrifically long. > > right. not happy with the naming either. > > so what I'm thinking about right now is: > > we have that thing which we call "old printk" mode, which is not > really informative. and my proposal is rename "old" mode and use > "printk rescue" mode instead. because we switch to that mode when > we are trying to "rescue" kernel logs. so the API can be something > like > printk_rescue_on() > printk_rescue_off()
Sounds good to me. Slight problem is that off() does not cause stopping the mode if we are nested. Just one more attempt inspired by this: printk_emergency_begin() printk_emergency_end() Note that we actually start this mode automatically also with pr_emerg() message. But I am fine with whatever from the mentioned generic names. > > --- random thoughts --- > > another thing that bothers me a bit is that we need to place those > printk_rescue_on/printk_rescue_off switches all over the kernel. > sort of a root cause [in some of the cases] here is the fact that > we don't have any feedback from printk_kthread in vprintk_emit(): > does printk_kthread make any progress? > do we flush messages to the serial console? > etc. > > and we've got everything we need to have such a feedback in > vprintk_emit(): > > a) console is not suspended so console_unlock() can call console drivers > b) printk_kthread != NULL > c) we are not in enforced rescue/emergency mode > d) `log_next_seq' moves forward (always `true', we are in > vprintk_emit()) > e) `console_seq' stands still > > so we can have an automatic rescue mode fallback in vprintk_emit(). > if (a)-(e) are true then we give up on waking up printk_kthread, > switch to rescue mode and attempt to console_trylock() directly from > vprintk_emit(). the part that sucks here is that we need to give > printk_kthread some time to catch up. for instance, if (e) is true > for the past 50 invocations of vprintk_emit(), IOW: > > - we added 50 lines to printk > - none have been printed on the serial console > > then we > - declare rescue > - do console_trylock() instead of wake_up() //unless in deferred > vprintk_emit() I am not sure if we are able to distinguish a flood of messages from a real emergency situation. If we start flushing messages directly when there is a flood of messages, we will put back the original problem with soft lookups. Well, there is a handful of annotated locations at the moment. I would start thinking of an automatic detection once we have more of them and have more data for a good heuristic. I still would like to see the kernel parameter/sysfs knob that would allow to force the rescue/emergency mode all the time ;-) Best Regards, Petr