When a CPU is finally put down in either CPU_UP_CANCELLED or CPU_POST_DEAD, cpu_stop_cpu_callback() signals immediate completion on all cpu_stop_works still queued on the dead CPU; unfortunately, this code is buggy in that it doesn't remove the canceled work items off the stopper->works leaving it corrupted, which will trigger BUG_ON() during CPU_UP_PREPARE if the CPU is brought back online.
This bug isn't easily triggered because CPU_DOWN has to race against cpu_stop calls and most, if not all, cpu stop users pin target CPUs. Fix it by popping each work item off stopper->works. Thanks Tejun for sharing commit message, again. Signed-off-by: Hillf Danton <dhi...@gmail.com> Reviewed-by: Namhyung Kim <namhy...@kernel.org> Cc: sta...@vger.kernel.org --- --- a/kernel/stop_machine.c Sun Feb 10 13:00:00 2013 +++ b/kernel/stop_machine.c Sun Feb 10 13:02:18 2013 @@ -334,23 +334,24 @@ static int __cpuinit cpu_stop_cpu_callba #ifdef CONFIG_HOTPLUG_CPU case CPU_UP_CANCELED: case CPU_POST_DEAD: - { - struct cpu_stop_work *work; - sched_set_stop_task(cpu, NULL); /* kill the stopper */ kthread_stop(stopper->thread); /* drain remaining works */ spin_lock_irq(&stopper->lock); - list_for_each_entry(work, &stopper->works, list) + while (!list_empty(&stopper->works)) { + struct cpu_stop_work *work; + work = list_first_entry(&stopper->works, + struct cpu_stop_work, list); + list_del_init(&work->list); cpu_stop_signal_done(work->done, false); + } stopper->enabled = false; spin_unlock_irq(&stopper->lock); /* release the stopper */ put_task_struct(stopper->thread); stopper->thread = NULL; break; - } #endif } -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/