On 14-07-16, 16:12, Jan Kara wrote: > Exactly. Calling printk() from certain parts of the kernel (like scheduler > code or timer code) has been always unsafe because printk itself uses these > parts and so it can lead to deadlocks. That's why printk_deffered() has > been introduced as you mention below. > > And with sync printk the above deadlock doesn't trigger only by chance - if > there happened to be a waiter on console_sem while we suspend, the same > deadlock would trigger because up(&console_sem) will try to wake him up and > the warning in timekeeping code will cause recursive printk. > > So I think your patch doesn't really address the real issue - it only > works around the particular WARN_ON(timekeeping_enabled) warning but if > there was a different warning in timekeeping code which would trigger, it > has a potential for causing recursive printk deadlock (and indeed we had > such issues previously - see e.g. 504d58745c9c "timer: Fix lock inversion > between hrtimer_bases.lock and scheduler locks"). > > So there are IMHO two issues here worth looking at: > > 1) I didn't find how a wakeup would would lead to calling to ktime_get() in > the current upstream kernel or even current RT kernel. Maybe this is a > problem specific to the 3.10 kernel you are using? If yes, we don't have to > do anything for current upstream AFAIU.
I haven't checked that earlier, but I see the path in both 3.10 and mainline. vprintk_emit -> wake_up_process -> try_to_wake_up -> ttwu_queue -> ttwu_do_activate -> ttwu_activate -> activate_task -> enqueue_task (sched/core.c) -> enqueue_task_rt (rt.c) -> enqueue_rt_entity -> __enqueue_rt_entity -> inc_rt_tasks -> inc_rt_group -> start_rt_bandwidth -> start_bandwidth_timer -> __hrtimer_start_range_ns -> ktime_get() > If I just missed how wakeup can call into ktime_get() in current upstream, > there is another question: > > 2) Is it OK that printk calls wakeup so late during suspend? To clarify again to everybody, we are talking about the place where all non-boot CPUs are already hot-unplugged and the last running one has disabled interrupts. I believe that we can't do migration at all now, right? What will we get by calling wake_up_process() now anyway ? > I believe it > is but I'm neither scheduler nor suspend expert. If it is OK, and wakeup > can lead to ktime_get() in current upstream, then this contradicts the > check WARN_ON(timekeeping_suspended) in ktime_get() and something is wrong. > > Adding Thomas to CC as timer / RT expert... Thanks. -- viresh