2013/2/4 Vincent Guittot <vincent.guit...@linaro.org>: > On 1 February 2013 19:03, Frederic Weisbecker <fweis...@gmail.com> wrote: >>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c >>> index 257002c..fd41924 100644 >>> --- a/kernel/sched/core.c >>> +++ b/kernel/sched/core.c >>> @@ -5884,6 +5884,7 @@ static void init_sched_groups_power(int cpu, struct >>> sched_domain *sd) >>> >>> update_group_power(sd, cpu); >>> atomic_set(&sg->sgp->nr_busy_cpus, sg->group_weight); >>> + clear_bit(NOHZ_IDLE, nohz_flags(cpu)); >> >> So that's a real issue indeed. nr_busy_cpus was never correct. >> >> Now I'm still a bit worried with this solution. What if an idle task >> started in smp_init() has not yet stopped its tick, but is about to do >> so? The domains are not yet available to the task but the nohz flags >> are. When it later restarts the tick, it's going to erroneously >> increase nr_busy_cpus. > > My 1st idea was to clear NOHZ_IDLE flag and nr_busy_cpus in > init_sched_groups_power instead of setting them as it is done now. If > a CPU enters idle during the init sequence, the flag is already > cleared, and nohz_flags and nr_busy_cpus will stay synced and cleared > while a NULL sched_domain is attached to the CPU thanks to patch 2. > This should solve all use cases ?
This may work on smp_init(). But the per cpu domain can be changed concurrently anytime on cpu hotplug, with a new sched group power struct, right? What if the following happen (inventing function names but you get the idea): CPU 0 CPU 1 dom = new_domain(...) { nr_cpus_busy = 0; set_idle(CPU 1); old_dom =get_dom() clear_idle(CPU 1) } rcu_assign_pointer(cpu1_dom, dom); Can this scenario happen? >> >> It probably won't happen in practice. But then there is more: sched >> domains can be concurrently rebuild anytime, right? So what if we >> call set_cpu_sd_state_idle() and decrease nr_busy_cpus while the >> domain is switched concurrently. Are we having a new sched group along >> the way? If so we have a bug here as well because we can have >> NOHZ_IDLE set but nr_busy_cpus accounting the CPU. > > When the sched_domain are rebuilt, we set a null sched_domain during > the rebuild sequence and a new sched_group_power is created as well So at that time we may race with a CPU setting/clearing its NOHZ_IDLE flag as in my above scenario? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/