On 01/07, Srivatsa Vaddagiri wrote: > > On Sat, Jan 06, 2007 at 08:34:16PM +0300, Oleg Nesterov wrote: > > I suspect this can't help either. > > > > The problem is that flush_workqueue() may be called while cpu hotplug event > > in progress and CPU_DEAD waits for kthread_stop(), so we have the same dead > > lock if work->func() does flush_workqueue(). This means that Andrew's change > > to use preempt_disable() is good and anyway needed. > > Well ..a lock_cpu_hotplug() in run_workqueue() and support for recursive > calls to lock_cpu_hotplug() by the same thread will avoid the problem > you mention.
Srivatsa, I'm completely new to cpu-hotplug, so please correct me if I'm wrong (in fact I _hope_ I am wrong) but as I see it, the hotplug/workqueue interaction is broken by design, it can't be fixed by changing just locking. Once again. CPU dies, CPU_DEAD calls kthread_stop() and sleeps until cwq->thread exits. To do so, this thread must at least complete the currently running work->func(). work->func() calls flush_workque(WQ), it does lock_cpu_hotplug() or _whatever_. Now the question, does it block? if YES: This is what the stable tree does - deadlock. if NOT: This is what we have with Andrew's s/mutex_lock/preempt_disable/ patch - race or deadlock, we have a choice. Suppose that WQ has pending works on that dead CPU. Note that at this point this CPU does not present on cpu_online_map. This means that (without other changes) we have lost. - flush_workque(WQ) can't return until CPU_DEAD transfers these works to some another CPU on the cpu_online_map. - CPU_DEAD can't do take_over_work() untill flush_workque() returns. Andrew, Ingo, this also means that freezer can't solve this particular problem either (if i am right). Thoughts? Oleg. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/