On 09/07/2012 04:08 AM, Tejun Heo wrote: >>From 985aafbf530834a9ab16348300adc7cbf35aab76 Mon Sep 17 00:00:00 2001 > From: Tejun Heo <t...@kernel.org> > Date: Thu, 6 Sep 2012 12:50:41 -0700 > > To simplify both normal and CPU hotplug paths, while CPU hotplug is in > progress, manager_mutex is held to prevent one of the workers from > becoming a manager and creating or destroying workers; unfortunately, > it currently may lead to idle worker depletion which in turn can lead > to deadlock under extreme circumstances. > > Idle workers aren't allowed to become busy if there's no other idle > worker left to create more idle workers, but during CPU_ONLINE > gcwq_associate() is holding all managerships and all the idle workers > can proceed to become busy before gcwq_associate() is finished.
The any code which grab the manage_mutex can cause the bug. Not only rebind_workers(), but also gcwq_unbind_fn(). Not only during CPU_ONLINE, but also during CPU_DOWN_PREPARE > > This patch fixes the bug by releasing manager_mutexes before letting > the rebound idle workers go. This ensures that by the time idle > workers check whether management is necessary, CPU_ONLINE already has > released the positions. This can't fix the problem. + gcwq_claim_management(gcwq); + spin_lock_irq(&gcwq->lock); If manage_workers() happens between the two line, the problem occurs!. My non_manager_role_manager_mutex_unlock() approach has the same idea: release manage_mutex before release gcwq->lock. but non_manager_role_manager_mutex_unlock() approach will detect the fail reason of failing to grab manage_lock and go to sleep. rebind_workers()/gcwq_unbind_fn() will release manage_mutex and then wait up some before release gcwq->lock. ========================== A "release manage_mutex before release gcwq->lock" approach.(no one likes it, I think) /* claim manager positions of all pools */ static void gcwq_claim_management_and_lock(struct global_cwq *gcwq) { struct worker_pool *pool, *pool_fail; again: spin_lock_irq(&gcwq->lock); for_each_worker_pool(pool, gcwq) { if (!mutex_trylock(&pool->manager_mutex)) goto fail; } return; fail: /* unlikely, because manage_workers() are very unlike path in my box */ for_each_worker_pool(pool_fail, gcwq) { if (pool_fail != pool) mutex_unlock(&pool->manager_mutex); else break; } spin_unlock_irq(&gcwq->lock); cpu_relax(); goto again; } /* release manager positions */ static void gcwq_release_management_and_unlock(struct global_cwq *gcwq) { struct worker_pool *pool; for_each_worker_pool(pool, gcwq) mutex_unlock(&pool->manager_mutex); spin_unlock_irq(&gcwq->lock); } > > Signed-off-by: Tejun Heo <t...@kernel.org> > Reported-by: Lai Jiangshan <la...@cn.fujitsu.com> > --- > kernel/workqueue.c | 20 ++++++++++++++------ > 1 files changed, 14 insertions(+), 6 deletions(-) > > diff --git a/kernel/workqueue.c b/kernel/workqueue.c > index b19170b..74487ef 100644 > --- a/kernel/workqueue.c > +++ b/kernel/workqueue.c > @@ -1454,10 +1454,19 @@ retry: > } > > /* > - * All idle workers are rebound and waiting for %WORKER_REBIND to > - * be cleared inside idle_worker_rebind(). Clear and release. > - * Clearing %WORKER_REBIND from this foreign context is safe > - * because these workers are still guaranteed to be idle. > + * At this point, each pool is guaranteed to have at least one idle > + * worker and all idle workers are waiting for WORKER_REBIND to > + * clear. Release management before releasing idle workers; > + * otherwise, they can all go become busy as we're holding the > + * manager_mutexes, which can lead to deadlock as we don't actually > + * create new workers. > + */ > + gcwq_release_management(gcwq); > + > + /* > + * Clear %WORKER_REBIND and release. Clearing it from this foreign > + * context is safe because these workers are still guaranteed to be > + * idle. > * > * We need to make sure all idle workers passed WORKER_REBIND wait > * in idle_worker_rebind() before returning; otherwise, workers can > @@ -1467,6 +1476,7 @@ retry: > INIT_COMPLETION(idle_rebind.done); > > for_each_worker_pool(pool, gcwq) { > + WARN_ON_ONCE(list_empty(&pool->idle_list)); > list_for_each_entry(worker, &pool->idle_list, entry) { > worker->flags &= ~WORKER_REBIND; > idle_rebind.cnt++; > @@ -1481,8 +1491,6 @@ retry: > } else { > spin_unlock_irq(&gcwq->lock); > } > - > - gcwq_release_management(gcwq); > } > > static struct worker *alloc_worker(void) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/