On 04/02/2015 04:12 PM, Ingo Molnar wrote: > > * Preeti U Murthy <pre...@linux.vnet.ibm.com> wrote: > >> It was found when doing a hotplug stress test on POWER, that the machine >> either hit softlockups or rcu_sched stall warnings. The issue was >> traced to commit 7cba160ad789a powernv/cpuidle: Redesign idle states >> management, which exposed the cpu down race with hrtimer based broadcast >> mode(Commit 5d1638acb9f6(tick: Introduce hrtimer based broadcast). This >> is explained below. >> >> Assume CPU1 is the CPU which holds the hrtimer broadcasting duty before >> it is taken down. >> >> CPU0 CPU1 >> >> cpu_down() take_cpu_down() >> disable_interrupts() >> >> cpu_die() >> >> while(CPU1 != CPU_DEAD) { >> msleep(100); >> switch_to_idle(); >> stop_cpu_timer(); >> schedule_broadcast(); >> } >> >> tick_cleanup_cpu_dead() >> take_over_broadcast() >> >> So after CPU1 disabled interrupts it cannot handle the broadcast hrtimer >> anymore, so CPU0 will be stuck forever. >> >> Fix this by explicitly taking over broadcast duty before cpu_die(). >> This is a temporary workaround. What we really want is a callback in the >> clockevent device which allows us to do that from the dying CPU by >> pushing the hrtimer onto a different cpu. That might involve an IPI and >> is definitely more complex than this immediate fix. > > So why not use a suitable CPU_DOWN* notifier for this, instead of open > coding it all into a random place in the hotplug machinery?
This is because each of them is unsuitable for a reason: 1. CPU_DOWN_PREPARE stage allows for a fail. The cpu in question may not successfully go down. So we may pull the hrtimer unnecessarily. 2. CPU_DYING notifiers are run on the cpu that is going down. So the alternative would be to IPI an online cpu to take up the broadcast duty. 3. CPU_DEAD and CPU_POST_DEAD stages both have the drawback described in the changelog. I hope I got your question right. Regards Preeti U Murthy > > Also, I improved the changelog (attached below), but decided against > applying it until these questions are cleared - please use that for > future versions of this patch. > > Thanks, > > Ingo > > ===================> > From 413fbf5193b330c5f478ef7aaeaaee08907a993e Mon Sep 17 00:00:00 2001 > From: Preeti U Murthy <pre...@linux.vnet.ibm.com> > Date: Mon, 30 Mar 2015 14:59:19 +0530 > Subject: [PATCH] clockevents: Fix cpu_down() race for hrtimer based > broadcasting > > It was found when doing a hotplug stress test on POWER, that the > machine either hit softlockups or rcu_sched stall warnings. The > issue was traced to commit: > > 7cba160ad789 ("powernv/cpuidle: Redesign idle states management") > > which exposed the cpu_down() race with hrtimer based broadcast mode: > > 5d1638acb9f6 ("tick: Introduce hrtimer based broadcast") > > The race is the following: > > Assume CPU1 is the CPU which holds the hrtimer broadcasting duty > before it is taken down. > > CPU0 CPU1 > > cpu_down() take_cpu_down() > disable_interrupts() > > cpu_die() > > while (CPU1 != CPU_DEAD) { > msleep(100); > switch_to_idle(); > stop_cpu_timer(); > schedule_broadcast(); > } > > tick_cleanup_cpu_dead() > take_over_broadcast() > > So after CPU1 disabled interrupts it cannot handle the broadcast > hrtimer anymore, so CPU0 will be stuck forever. > > Fix this by explicitly taking over broadcast duty before cpu_die(). > > This is a temporary workaround. What we really want is a callback > in the clockevent device which allows us to do that from the dying > CPU by pushing the hrtimer onto a different cpu. That might involve > an IPI and is definitely more complex than this immediate fix. > > Changelog was picked up from: > > https://lkml.org/lkml/2015/2/16/213 > > Suggested-by: Thomas Gleixner <t...@linutronix.de> > Tested-by: Nicolas Pitre <n...@linaro.org> > Signed-off-by: Preeti U. Murthy <pre...@linux.vnet.ibm.com> > Cc: linuxppc-dev@lists.ozlabs.org > Cc: m...@ellerman.id.au > Cc: nicolas.pi...@linaro.org > Cc: pet...@infradead.org > Cc: r...@rjwysocki.net > Fixes: > http://linuxppc.10917.n7.nabble.com/offlining-cpus-breakage-td88619.html > _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev