Re: [PATCH] fix-flush_workqueue-vs-cpu_dead-race-update

Oleg Nesterov Sun, 07 Jan 2007 04:56:18 -0800

On 01/07, Srivatsa Vaddagiri wrote:
>
> On Sat, Jan 06, 2007 at 08:34:16PM +0300, Oleg Nesterov wrote:
> > I suspect this can't help either.
> > 
> > The problem is that flush_workqueue() may be called while cpu hotplug event
> > in progress and CPU_DEAD waits for kthread_stop(), so we have the same dead
> > lock if work->func() does flush_workqueue(). This means that Andrew's change
> > to use preempt_disable() is good and anyway needed.
> 
> Well ..a lock_cpu_hotplug() in run_workqueue() and support for recursive
> calls to lock_cpu_hotplug() by the same thread will avoid the problem
> you mention.


Srivatsa, I'm completely new to cpu-hotplug, so please correct me if I'm
wrong (in fact I _hope_ I am wrong) but as I see it, the hotplug/workqueue
interaction is broken by design, it can't be fixed by changing just locking.

Once again. CPU dies, CPU_DEAD calls kthread_stop() and sleeps until
cwq->thread exits. To do so, this thread must at least complete the
currently running work->func().

work->func() calls flush_workque(WQ), it does lock_cpu_hotplug() or
_whatever_. Now the question, does it block?

if YES:
        This is what the stable tree does - deadlock.

if NOT:
        This is what we have with Andrew's s/mutex_lock/preempt_disable/
        patch - race or deadlock, we have a choice.

        Suppose that WQ has pending works on that dead CPU. Note that
        at this point this CPU does not present on cpu_online_map.
        This means that (without other changes) we have lost.

                - flush_workque(WQ) can't return until CPU_DEAD transfers
                  these works to some another CPU on the cpu_online_map.

                - CPU_DEAD can't do take_over_work() untill flush_workque()
                  returns.

Andrew, Ingo, this also means that freezer can't solve this particular
problem either (if i am right).

Thoughts?

Oleg.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] fix-flush_workqueue-vs-cpu_dead-race-update

Reply via email to