On 10/24, Gautham R Shenoy wrote: > (reordered)
> With get_online_cpus()/put_online_cpus(), we can eliminate > the workqueue_mutex and reintroduce the workqueue_lock, > which is a spinlock which serializes the accesses to the > workqueues list. This change is obviously good, can't it go into the previous patch? Because, > Solution is not to cleanup the worker thread. Instead let it remain > even after the cpu goes offline. Since no one can queue any work > on an offlined cpu, this thread will be forever sleeping, untill > someone onlines the cpu. I still think this patch is questionable. Please look at my previous response http://marc.info/?l=linux-kernel&m=119262203729543 In short: with this patch it is not possible to guarantee that work->fun() will run on the correct CPU. > static void cleanup_workqueue_thread(struct cpu_workqueue_struct *cwq, int > cpu) > { > /* > - * Our caller is either destroy_workqueue() or CPU_DEAD, > - * workqueue_mutex protects cwq->thread > + * Our caller is destroy_workqueue(). So warn on a double > + * destroy. > */ > - if (cwq->thread == NULL) > + if (cwq->thread == NULL) { > + WARN_ON(1); Looks wrong. It is possible that cwq->thread == NULL, because currently we never "shrink" cpu_populated_map. > cleanup_workqueue_thread() in the CPU_DEAD and CPU_UP_CANCELLED path > will cause a deadlock if the worker thread is executing a work item > which is blocked on get_online_cpus(). This will lead to a irrecoverable > hang. Yes. But there is nothing new. Currently, work->func() can't share the locks with cpu_down's patch. Not only only it can't take workqueue_mutex, it can't take any other lock which could be taken by notifier callbacks, etc. Can't we ignore this problem, at least for now? I believe we need intrusive changes to solve this problem correctly. Perhaps I am wrong, of course, but I don't see a simple solution. Another option. Note that get_online_cpus() does more than just pinning cpu maps, actually it blocks hotplug entirely. Now let's look at schedule_on_each_cpu(), for example. It doesn't need to block hotplug, it only needs a stable cpu_online_map. Suppose for a moment that _cpu_down() does cpu_hotplug_done() earlier, right after __cpu_die(cpu) which removes CPU from the map (yes, this is wrong, I know). Now, we don't need to change workqueue_cpu_callback(), work->func() can use get_online_cpus() without fear of deadlock. So, can't we introduce 2 nested rw locks? The first one blocks cpu hotplug (like get_online_cpus does currently), the second one just pins cpu maps. I think most users needs only this, not more. What do you think? (Gautham, I apologize in advance, can't be responsive till weekend). Oleg. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/