On Wed, Oct 07, 2015 at 10:41:10AM +0200, Peter Zijlstra wrote: > Hi, > > So Heiko reported some 'interesting' fail where stop_two_cpus() got > stuck in multi_cpu_stop() with one cpu waiting for another that never > happens. > > It _looks_ like the 'other' cpu isn't running and the current best > theory is that we race on cpu-up and get the stop_two_cpus() call in > before the stopper task is running. > > This _is_ possible because we set 'online && active' _before_ we do the > smpboot_unpark thing because of ONLINE notifier order. > > The below test patch manually starts the stopper task early. > > It boots and hotplugs a cpu on my test box so its not insta broken. > > --- > kernel/sched/core.c | 7 ++++++- > kernel/stop_machine.c | 5 +++++ > 2 files changed, 11 insertions(+), 1 deletion(-) > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 1764a0f..9a56ef7 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -5542,14 +5542,19 @@ static void set_cpu_rq_start_time(void) > rq->age_stamp = sched_clock_cpu(cpu); > } > > +extern void cpu_stopper_unpark(unsigned int cpu); > + > static int sched_cpu_active(struct notifier_block *nfb, > unsigned long action, void *hcpu) > { > + int cpu = (long)hcpu; > + > switch (action & ~CPU_TASKS_FROZEN) { > case CPU_STARTING: > set_cpu_rq_start_time(); > return NOTIFY_OK; > case CPU_ONLINE: > + cpu_stopper_unpark(cpu); > /* > * At this point a starting CPU has marked itself as online via > * set_cpu_online(). But it might not yet have marked itself > @@ -5558,7 +5563,7 @@ static int sched_cpu_active(struct notifier_block *nfb, > * Thus, fall-through and help the starting CPU along. > */ > case CPU_DOWN_FAILED: > - set_cpu_active((long)hcpu, true); > + set_cpu_active(cpu, true); > return NOTIFY_OK; > default: > return NOTIFY_DONE; > diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c > index 12484e5..c674371 100644 > --- a/kernel/stop_machine.c > +++ b/kernel/stop_machine.c > @@ -496,6 +496,11 @@ static struct smp_hotplug_thread cpu_stop_threads = { > .selfparking = true, > }; > > +void cpu_stopper_unpark(unsigned int cpu) > +{ > + kthread_unpark(per_cpu(cpu_stopper.thread, cpu)); > +} > +
So, actually this doesn't fix the bug and it _seems_ to be reproducible. [ FWIW, I will be offline for the next two weeks ] The bug was reproduced with your patch applied to 4.2.0 (+ couple of unrelated internal patches). In addition I cherry-picked these two upstream commits: dd9d3843755d "sched: Fix cpu_active_mask/cpu_online_mask race" 02cb7aa923ec "stop_machine: Move 'cpu_stopper_task' and 'stop_cpus_work' into 'struct cpu_stopper'" The new dump again shows one cpu looping in multi_cpu_stop() triggered by stop_two_cpus(), and the second one will never enter multi_cpu_stop() since the corresponding cpu_stop_work was never enqueued: The two cpu_stop_work on the stack of the process that invocated stop_two_cpus() look like this: crash> struct cpu_stop_work 0x8ad8afa78 struct cpu_stop_work { list = { next = 0x8ad8afa78, prev = 0x8ad8afa78 }, fn = 0x2091b0 <multi_cpu_stop>, arg = 0x8ad8afac8, done = 0x8ad8afaf0 } crash> struct cpu_stop_work 0x8ad8afaa0 struct cpu_stop_work { list = { next = 0x0, <---- NULL indicates it was never enqueued prev = 0x0 }, fn = 0x2091b0 <multi_cpu_stop>, arg = 0x8ad8afac8, done = 0x8ad8afaf0 } The corresponding struct cpu_stop_done below indicates that at least for one of them cpu_stop_signal_done() was called (nr_todo == 1). So the idea is still that this happened when cpu_stop_queue_work() was being called, but the corresponding stopper was not enabled. crash> struct -x cpu_stop_done 00000008ad8afaf0 struct cpu_stop_done { nr_todo = { counter = 0x1 }, executed = 0x0, ret = 0x0, completion = { done = 0x0, wait = { lock = { { rlock = { raw_lock = { lock = 0x0 }, break_lock = 0x0, magic = 0xdead4ead, owner_cpu = 0xffffffff, owner = 0xffffffffffffffff, dep_map = { key = 0x1e901e0 <__key.5629>, class_cache = {0x188fec0 <lock_classes+298096>, 0x0}, name = 0xb40d0c "&x->wait", cpu = 0xb, ip = 0x94e5b2 } }, { __padding = "\000\000\000\000\000\000\000\000 ޭN\255\377\377\377\377\377\377\377\377\377\377\377\377", dep_map = { key = 0x1e901e0 <__key.5629>, class_cache = {0x188fec0 <lock_classes+298096>, 0x0}, name = 0xb40d0c "&x->wait", cpu = 0xb, ip = 0x94e5b2 } } } }, task_list = { next = 0x8ad8afa20, prev = 0x8ad8afa20 } } } } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/