On (12/15/17 10:08), Petr Mladek wrote: [..] > > > Is the above scenario really dangerous? console_lock() owner is > > > able to sleep. Therefore there is no risk of a softlockup. > > > > > > Sure, many messages will get stacked in the meantime and the console > > > owner my get then passed to another owner in atomic context. But > > > do you really see this in the real life? > > > > console_sem is locked by atomic printk CPU1~CPU10. non-atomic CPU is just > > sleeping waiting for the console_sem. while atomic printk CPUs just hand > > off console_sem ownership to each other without ever up()-ing the > > console_sem. > > what's the point of hand off here? how is that going to work? > > > > what we need to do is to offload printing from atomic contexts to a > > non-atomic one - which is CPU0. and that non-atomic CPU is sleeping > > on the console_sem, ready to continue printing. but it never gets its > > chance to do so, because CPU0 ~ CPU10 just passing console_sem ownership > > around, resulting in the same "we print from atomic context" thing. > > Yes, I understand the scenario. The question is how much it is > realistic.
so it's unlikely to have several CPUs on an SMP system printk-ing from atomic contexts, while one of the available CPUs does console_lock() or printk() from non-atomic context? [..] > > which is not a justification. we are not looking for a solution that > > does not make the things worse. we are looking for a solution that > > does improve the things. > > But it does improve things! The question is if it is enough or not > in the real life. console_unlock() is still unbound. "spreading the printk load" between N CPUs is just 50% of the actual problem. console_unlock() has several sides being involved: one is doing the print out, the other one is sleeping on console_sem waiting for the first one to up() the console_sem. yes, CPUs that do printk will now hand over console_sem to each other, but up() still does not happen for unknown period of time. so all of the processes that sleep in TASK_UNINTERRUPTIBLE on console_sem - user space process, tty, pty, drm, frame buffers, PM, etc. etc. - still have unbound TASK_UNINTERRUPTIBLE sleep. we've been talking about it for years. but there are more issues. > Do you see a scenario where it makes things statistically or > even deterministically worse? I really don't have that much time right now; but I did a very quick test on one of my boards today. NOTE: speaking for my myself only and about my observations only. you are free call it unrealistic, impossible, silly, etc. One more NOTE: The board I'm playing with has immediate printk offloading enabled. We figured out it's better to have it enabled rather than not, after all. It makes a huge difference. And another NOTE: I did NOT alter my setup; I did NOT perform the stupid printk-flood type of test (while (1) printk()). I used the "tools" we are using here every day, which don't really abuse printk. // UPD... OK... I actually ended up spending several hours on it, much more // than I planned. because... I was quite puzzled. I double... no... triple // checked the backport. It's exactly what v4 posted by Steven does - modulo // printk_safe stuff [I don't have it backported]. Back to the console_owner patch. 1) it opens both soft and hard lockup vectors I see *a lot* of cases when CPU that call printk in a loop does not end up flushing its messages. And the problem seems to be - preemption. CPU0 CPU1 for_each_process_thread(g, p) printk() console_unlock() printk console_trylock() fails sets console_owner sees console_owner sets console_waiter call_console_drivers clears console_owner sees console_waiter hand off spins with local_irq disabled sees that it has acquired console_sem ownership enables local_irq printk .. << preemption >> printk ... unbound number of printk-s printk .. printk back to TASK_RUNNING goes to console_unlock() printk local_irq_save ??? *** if it will see console_waiter [being in any context] it will hand off. otherwise, preemption again and CPU0 can add even more messages to logbuf local_irq_restore << preemption >> I added several traces points, trying to understand why this patch was so unimpressive on my board. console_trylock() was successful - trace_offloading("vprintk_emit()->trylock OK", " ", 0); console_trylock() was unsuccessful and we went through `if (!waiter && owner && owner != current)' under console_owner_lock spin_lock with local_irqs disabled - trace_offloading("vprintk_emit()->trylock FAIL", " will spin? ", spin); task in console_unlock() set console_owner, under console_owner_lock - trace_offloading("set console_owner", " ", 0); task in console_unlock() cleared console_owner, under console_owner_lock - trace_offloading("clear console_owner", " waiter != NULL ", !!waiter); task in console_unlock() saw waiter, break out of printing loop and hand off printing - trace_offloading("hand off", " ", 0); a *very small* part of the logs. a quick introduction [I changed the name of processes, so it'll be easier]: i_do_printks-2997 is the process that does printk() loop. the logs starts with it being in console_unlock(), just when it set the console_owner. usertaskA-1167 comes in, fails to trylock, sees the console_owner, sets itself as console_waiter. kworker/0:0-2990 comes in, fails to trylock console_sem, fails to set itself as console_waiter, leaves vprintk_emit(). i_do_printks-2997 sees console_waiter, hand off printing to usertaskA-1167. usertaskA-1167 is getting preempted. i_do_printks-2997 continue printk-s, which are just log_store(), because console_sem is locked, its owner is preempted. ... i_do_printks-2997 [003] d..1 792.616378: offloading: set console_owner :0 usertaskA-1167 [001] d..1 792.617560: offloading: vprintk_emit()->trylock FAIL will spin? :1 kworker/0:0-2990 [000] d..1 792.618280: offloading: vprintk_emit()->trylock FAIL will spin? :0 kworker/0:0-2990 [000] d..1 792.618387: offloading: vprintk_emit()->trylock FAIL will spin? :0 kworker/0:0-2990 [000] d..1 792.618470: offloading: vprintk_emit()->trylock FAIL will spin? :0 kworker/0:0-2990 [000] d..1 792.618478: offloading: vprintk_emit()->trylock FAIL will spin? :0 kworker/0:0-2990 [000] d..1 792.618723: offloading: vprintk_emit()->trylock FAIL will spin? :0 kworker/0:0-2990 [000] d..1 792.618902: offloading: vprintk_emit()->trylock FAIL will spin? :0 kworker/0:0-2990 [000] d..1 792.619057: offloading: vprintk_emit()->trylock FAIL will spin? :0 kworker/0:0-2990 [000] d..1 792.619064: offloading: vprintk_emit()->trylock FAIL will spin? :0 kworker/0:0-2990 [000] d..1 792.620077: offloading: vprintk_emit()->trylock FAIL will spin? :0 kworker/0:0-2990 [000] d..1 792.620323: offloading: vprintk_emit()->trylock FAIL will spin? :0 kworker/0:0-2990 [000] d..1 792.620444: offloading: vprintk_emit()->trylock FAIL will spin? :0 kworker/0:0-2990 [000] d..1 792.620927: offloading: vprintk_emit()->trylock FAIL will spin? :0 kworker/0:0-2990 [000] d..1 792.620958: offloading: vprintk_emit()->trylock FAIL will spin? :0 i_do_printks-2997 [003] d..1 792.629275: offloading: clear console_owner waiter != NULL :1 i_do_printks-2997 [003] d..1 792.629280: offloading: hand off :0 usertaskA-1167 [001] d..1 792.629281: offloading: new waiter acquired ownership:0 i_do_printks-2997 [003] d..1 792.630639: offloading: vprintk_emit()->trylock FAIL will spin? :0 i_do_printks-2997 [003] d..1 792.630663: offloading: vprintk_emit()->trylock FAIL will spin? :0 ... << CUT. printks, printks, printks. boring stuff. just many printks. usertaskA-1167 is preempted, holding the console_sem, so i_do_printks-2997 just log_store >> ... i_do_printks-2997 [003] d..1 792.645663: offloading: vprintk_emit()->trylock FAIL will spin? :0 i_do_printks-2997 [003] d..1 792.645670: offloading: vprintk_emit()->trylock FAIL will spin? :0 usertaskA-1167 [001] d..1 792.654176: offloading: set console_owner :0 systemd-udevd-672 [003] d..2 792.660132: offloading: vprintk_emit()->trylock FAIL will spin? :1 usertaskA-1167 [001] d..1 792.661759: offloading: clear console_owner waiter != NULL :1 usertaskA-1167 [001] d..1 792.661761: offloading: hand off :0 systemd-udevd-672 [003] d..2 792.661761: offloading: new waiter acquired ownership:0 systemd-udevd-672 [003] d..2 792.661772: offloading: set console_owner :0 systemd-udevd-672 [003] d..2 792.675610: offloading: clear console_owner waiter != NULL :0 systemd-udevd-672 [003] d..2 792.675628: offloading: set console_owner :0 systemd-udevd-672 [003] d..2 792.689281: offloading: clear console_owner waiter != NULL :0 ... << CUT. systemd now actually prints the messages that were added by i_do_printks-2997. usertaskA-1167 was preempted all that time and printed just ONE message before it hand printing duty over to systemd-udevd which had to print lots of pending logbuf messages. >> << then another user-space process was lucky enough to set itself as console_waiter, it got console_sem ownership from systemd-udevd-672 and spent in console_unlock() enough time to got killed by user watchdog >> can do more tests a bit later. I really see lots of problems even on trivial tests [i_do_printks-2997 is preemptible]. another test. which proves preemption and rescheduling. we start with a ping-pong between i_do_printks-3616 and usertaskA-370. but soon usertaskA-370 acquires the lock and gets preempted. i_do_printks-3616 [001] d..1 1923.711382: offloading: set console_owner :0 i_do_printks-3616 [001] d..1 1923.724524: offloading: clear console_owner waiter != NULL :0 i_do_printks-3616 [001] d..1 1923.724685: offloading: set console_owner :0 usertaskA-370 [000] d..1 1923.734280: offloading: vprintk_emit()->trylock FAIL will spin? :1 i_do_printks-3616 [001] d..1 1923.737847: offloading: clear console_owner waiter != NULL :1 usertaskA-370 [000] d..1 1923.737850: offloading: new waiter acquired ownership:0 i_do_printks-3616 [001] d..1 1923.737850: offloading: hand off :0 usertaskA-370 [000] d..1 1923.737862: offloading: set console_owner :0 i_do_printks-3616 [001] d..1 1923.737934: offloading: vprintk_emit()->trylock FAIL will spin? :1 usertaskA-370 [000] d..1 1923.751870: offloading: clear console_owner waiter != NULL :1 usertaskA-370 [000] d..1 1923.751875: offloading: hand off :0 i_do_printks-3616 [001] d..1 1923.751875: offloading: new waiter acquired ownership:0 i_do_printks-3616 [001] d..1 1923.751938: offloading: set console_owner :0 usertaskA-370 [003] d..1 1923.762570: offloading: vprintk_emit()->trylock FAIL will spin? :1 i_do_printks-3616 [001] d..1 1923.765193: offloading: clear console_owner waiter != NULL :1 i_do_printks-3616 [001] d..1 1923.765196: offloading: hand off :0 usertaskA-370 [003] d..1 1923.765196: offloading: new waiter acquired ownership:0 i_do_printks-3616 [002] d..1 1923.766057: offloading: vprintk_emit()->trylock FAIL will spin? :0 i_do_printks-3616 [002] d..1 1923.766073: offloading: vprintk_emit()->trylock FAIL will spin? :0 ... << CUT. printks, printks, printks. usertaskA-370 is preempted, i_do_printks-3616 just does log_store(). Notice that the new console_sem owner acquired the ownership on CPU3, while it started to print pending logbuf messages later, being first scheduled on CPU2 and then rescheduled on CPU1. >> ... i_do_printks-3616 [002] d..1 1923.767162: offloading: vprintk_emit()->trylock FAIL will spin? :0 i_do_printks-3616 [002] d..1 1923.767166: offloading: vprintk_emit()->trylock FAIL will spin? :0 usertaskA-370 [002] d..1 1923.769225: offloading: set console_owner :0 usertaskA-370 [002] d..1 1923.779213: offloading: clear console_owner waiter != NULL :0 usertaskA-370 [001] d..1 1923.782622: offloading: set console_owner :0 usertaskA-370 [001] d..1 1923.793238: offloading: clear console_owner waiter != NULL :0 usertaskA-370 [001] d..1 1923.794991: offloading: set console_owner :0 usertaskA-370 [001] d..1 1923.807259: offloading: clear console_owner waiter != NULL :0 usertaskA-370 [001] d..1 1923.807403: offloading: set console_owner :0 usertaskA-370 [001] d..1 1923.821280: offloading: clear console_owner waiter != NULL :0 ... << usertaskA-370 prints all the pending messages. either until logbuf becomes empty, or another task calls printk [possible from atomic]. >> well, clearly, nothing prevents usertaskA-370 from sleeping really long on a busy system. so the logbuf can contain a lot of messages to print out. and so on. I guess you got the picture by now. now, It is absolutely possible that we can have preempted console_sem owner, (somewhere in vprintk_emit right after acquiring the console_sem ownership, or somewhere in console_unlock(). before the printing loop, or after re-enabling local IRQs withing the printing loop); while other CPU adds messages to the logbuf (printk()-s) and then what we need it just printk() from IRQ or atomic context to take over printing from the current console_sem owner and to flush all the pending messages from atomic context, heading towards the lockup. 2) an unexpected test... the scenario which I mentioned last week. a) spin lock lockup on several CPUs (not on all CPUs) b) simultaneous dump_stack() from locked up CPUs c) trigger_all_cpu_backtrace() d) immediate WANR_ON from networking irq on one of the lockedup CPU (cpus eventually were able to acquire-release the spin_lock and enable local IRQs). last week the board had immediate printk offloading enabled when the lockup has happened, so all those printk-s from irqs/etc were as fast as lock(log_buf); sprintf(); memcpy(); unlock(log_buf); there was printk_kthread to print it. the issue all of a sudden has reproduced today. twice. this time I had immediate printk disabled and console_owner hand off enabled. unfortunately, the board didn't survive the lockup this time. _it looks_ like printk offloading was really crucial there. > Why did you completely ignored the paragraph about step by step > approach? Is there anything wrong about it? frankly, I don't even understand what our plan is. I don't see how are we going to verify the patch. over the last 3 years, how many emails you have from facebook or samsung or linaro, or any other company reporting the printk-related lockups? I don't have that many in my gmail inbox, to be honest. and this is not because there were no printk lockups observed. this is because people usually don't report those issues to the upstream community. especially vendors that use "outdated" LTS kernels, which are approx 1-2 year(s) behind the mainline. and I don't see why it should be different this time. it will take years before vendors pick the next LTS kernel, which will have that patch in it. but the really big problem here is that we already know that the patch has some problems. are we going to conclude that "no emails === no problems"? with all my respect, it does seem like, in the grand scheme of things, we are going to do the same thing, yet expecting different result. that's my worry. may be I'm wrong. > You are looking for a perfect solution. gentlemen, once again, you really can name it whatever you like. the bottom line is [speaking for myself only]: - the patch does not fit my needs. - it does not address the issues I'm dealing with. - it has a significantly worse behaviour compared to old async printk. - it keeps sleeping on console_sem tasks in TASK_UNINTERRUPTIBLE for a long time. - it timeouts user space apps. - it re-introduces the lockup vector, by passing console_sem ownership to atomic tasks. - it doubles the amount of time CPU spins with local IRQs disabled in the worst case. - I probably need to run more tests [I didn't execute any OOM tests, etc.], but, preliminary, I can't see Samsung using this patch. sorry to tell you that. maybe it's good enough for the rest of the world, but... -ss