If hotplug code grabbed the manager_mutex and worker_thread try to create
a worker, the manage_worker() will return false and worker_thread go to
process work items. Now, on the CPU, all workers are processing work items,
no idle_worker left/ready for managing. It breaks the concept of workqueue
and it is bug.

So when manage_worker() failed to grab the manager_mutex, it should
release gcwq->lock and then grab manager_mutex.

After gcwq->lock is released, hotplug can happen. but the hoplug code
can't unbind/rebind the manager, so the manager should try to rebind
itself unconditionaly, if it fails, unbind itself.

Signed-off-by: Lai Jiangshan <la...@cn.fujitsu.com>
---
 kernel/workqueue.c |   31 ++++++++++++++++++++++++++++++-
 1 files changed, 30 insertions(+), 1 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 383548e..74434c8 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1825,10 +1825,39 @@ static bool manage_workers(struct worker *worker)
        struct worker_pool *pool = worker->pool;
        bool ret = false;
 
-       if (!mutex_trylock(&pool->manager_mutex))
+       if (pool->flags & POOL_MANAGING_WORKERS)
                return ret;
 
        pool->flags |= POOL_MANAGING_WORKERS;
+
+       if (unlikely(!mutex_trylock(&pool->manager_mutex))) {
+               /*
+                * Ouch! rebind_workers() or gcwq_unbind_fn() beats it.
+                * it can't return false here, otherwise it will lead to
+                * worker depletion. So we release gcwq->lock and then
+                * grab manager_mutex again.
+                */
+               spin_unlock_irq(&pool->gcwq->lock);
+               mutex_lock(&pool->manager_mutex);
+
+               /*
+                * The hotplug had happened after the previous releasing
+                * of gcwq->lock. So we can't assume that this worker is
+                * still associated or not. And we have to try to rebind it
+                * via worker_maybe_bind_and_lock(). If it returns false,
+                * we can conclude that the whole gcwq is disassociated,
+                * and we must unbind this worker. (hotplug code can't
+                * unbind/rebind the manager, because hotplug code can't
+                * iterate the manager)
+                */
+               if (worker_maybe_bind_and_lock(worker))
+                       worker->flags &= ~WORKER_UNBOUND;
+               else
+                       worker->flags |= WORKER_UNBOUND;
+
+               ret = true;
+       }
+
        pool->flags &= ~POOL_MANAGE_WORKERS;
 
        /*
-- 
1.7.4.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to