[PATCH] destroy_workqueue() can livelock

Oleg Nesterov Fri, 13 Jul 2007 06:19:16 -0700

Pointed out by Michal Schmidt <[EMAIL PROTECTED]>.

The bug was introduced in 2.6.22 by me.


cleanup_workqueue_thread() does flush_cpu_workqueue(cwq) in a loop until
->worklist becomes empty. This is live-lockable, a re-niced caller can
get CPU after wake_up() and insert a new barrier before the lower-priority
cwq->thread has a chance to clear ->current_work.

Change cleanup_workqueue_thread() to do flush_cpu_workqueue(cwq) only once.
We can rely on the fact that run_workqueue() won't return until it flushes
all works. So it is safe to call kthread_stop() after that, the "should stop"
request won't be noticed until run_workqueue() returns.

Signed-off-by: Oleg Nesterov <[EMAIL PROTECTED]>

--- t/kernel/workqueue.c~LIVELOCK       2007-06-13 18:26:56.000000000 +0400
+++ t/kernel/workqueue.c        2007-07-13 16:46:27.000000000 +0400
@@ -739,18 +739,17 @@ static void cleanup_workqueue_thread(str
        if (cwq->thread == NULL)
                return;
 
+       flush_cpu_workqueue(cwq);
        /*
-        * If the caller is CPU_DEAD the single flush_cpu_workqueue()
-        * is not enough, a concurrent flush_workqueue() can insert a
-        * barrier after us.
+        * If the caller is CPU_DEAD and cwq->worklist was not empty,
+        * a concurrent flush_workqueue() can insert a barrier after us.
+        * However, in that case run_workqueue() won't return and check
+        * kthread_should_stop() until it flushes all work_struct's.
         * When ->worklist becomes empty it is safe to exit because no
         * more work_structs can be queued on this cwq: flush_workqueue
         * checks list_empty(), and a "normal" queue_work() can't use
         * a dead CPU.
         */
-       while (flush_cpu_workqueue(cwq))
-               ;
-
        kthread_stop(cwq->thread);
        cwq->thread = NULL;
 }

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] destroy_workqueue() can livelock

Reply via email to