Hello, again. On Fri, Feb 08, 2013 at 11:42:43AM +0800, Hillf Danton wrote: > As checked with BUG_ON in the case of CPU_UP_PREPARE, we have to dequeue > work first for further actions, then stopper reaches sane and clear state.
When a CPU is finally put down in either CPU_UP_CANCELLED or CPU_POST_DEAD, cpu_stop_cpu_callback() signals immediate completion on all cpu_stop_works still queued on the dead CPU; unfortunately, this code is buggy in that it doesn't remove the canceled work items off the stopper->works leaving it corrupted, which will trigger BUG_ON() during CPU_UP_PREPARE if the CPU is brought back online. This bug isn't easily triggered because CPU_DOWN has to race against cpu_stop calls and most, if not all, cpu stop users pin target CPUs. Fix it by popping each work item off stopper->works. > Signed-off-by: Hillf Danton <dhi...@gmail.com> Maybe Cc: sta...@vger.kernel.org > --- a/kernel/stop_machine.c Fri Feb 8 11:22:44 2013 > +++ b/kernel/stop_machine.c Fri Feb 8 11:29:40 2013 > @@ -342,8 +342,12 @@ static int __cpuinit cpu_stop_cpu_callba > kthread_stop(stopper->thread); > /* drain remaining works */ > spin_lock_irq(&stopper->lock); > - list_for_each_entry(work, &stopper->works, list) > + while (!list_empty(&stopper->works)) { > + work = list_first_entry(&stopper->works, > + struct cpu_stop_work, list); > + list_del_init(&work->list); > cpu_stop_signal_done(work->done, false); > + } I think your previous version was better with @work declaration moved inside the while() loop. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/