Hi, On 11/13/2013 04:53 PM, Srikar Dronamraju wrote: > * Preeti Murthy <preeti.l...@gmail.com> [2013-11-13 16:22:37]: > >> Hi Srikar, >> >> update_group_power() is called only during load balancing during >> update_sg_lb_stats(). >> Load balancing begins at the base domain of the CPU,rq(cpu)->sd. This is >> checked for >> NULL. So how can update_group_power() be called in a scenario where the >> base domain >> of the CPU is not initialized? I say 'initialized' since you check for NULL >> on rq(cpu)->sd. >> > > update_group_power() also gets called from init_sched_groups_power(). > And if you see the oops message, we know that the oops happens from that > path. In build_sched_domains(), we do cpu_attach_domain() what updates > rq->sd after the call to init_sched_groups_power(). So by the time > init_sched_groups_power() is called rq->sd is not yet initialized. > > We only hit oops case, when the sd->flags has SD_OVERLAP set. > > > >> In the changelog, you say 'updated'. Are you saying that it has a stale >> value? > > I said, "before the sched_domain for a cpu is updated", so its not yet > updated or has stale value. As I said earlier in this mail, the > initialization happens after we do a update_group_power(). > >> Please do elaborate on how you observed this. >> > > Does this clarify?
Yes it clarifies, thank you. However I was thinking that a better fix would be to reorder the way we call update_group_power() and cpu_attach_domain(). Why do we need to do update_group_power() of the groups of the sched domains that would probably degenerate in cpu_attach_domain()? So it seemed best to move update_group_power() to after cpu_attach_domain() so that it saves unnecessary iterations over sched domains which could degenerate, and it fixes the issue that you have brought out as well. See below for the patch: ------------------------------------------------------------------------------- sched: Update power of sched groups after sched domains have been attached to CPUs From: Preeti U Murthy <pre...@linux.vnet.ibm.com> Avoid iterating unnecessarily over the sched domains which could potentially degenerate, while updating sched groups' power. This can be done by moving the call to init_sched_groups_power() to after cpu_attach_domain(), when the possibility of degenerating sched domains is examined and appropriately sched domains are degenerated. But claim_allocations() which does a NULL on the struct sd_data members for each sched domain should iterate over all the initally built sched domains. So move claim_allocations() to a loop where we build sched groups for each domain. We would not require to reference sd_data after sched domains and sched groups have been built. Another use of this re-ordering is with reference to the commit "sched/fair: Fix group power_orig computation". After this commit, we end up dereferencing cpu_rq(cpu)->sd in update_group_power(). This would lead to a NULL pointer since this parameter is updated after call to update_group_power() in build_sched_domains() during initialization of sched domains per CPU. The below change prevents this from occuring. Signed-off-by: Preeti U. Murthy <pre...@linux.vnet.ibm.com> --- kernel/sched/core.c | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index e6a6244..d9703ac 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6211,17 +6211,7 @@ static int build_sched_domains(const struct cpumask *cpu_map, if (build_sched_groups(sd, i)) goto error; } - } - } - - /* Calculate CPU power for physical packages and nodes */ - for (i = nr_cpumask_bits-1; i >= 0; i--) { - if (!cpumask_test_cpu(i, cpu_map)) - continue; - - for (sd = *per_cpu_ptr(d.sd, i); sd; sd = sd->parent) { claim_allocations(i, sd); - init_sched_groups_power(i, sd); } } @@ -6233,6 +6223,16 @@ static int build_sched_domains(const struct cpumask *cpu_map, } rcu_read_unlock(); + /* Calculate CPU power for physical packages and nodes */ + for (i = nr_cpumask_bits-1; i >= 0; i--) { + if (!cpumask_test_cpu(i, cpu_map)) + continue; + + for (sd = *per_cpu_ptr(d.sd, i); sd; sd = sd->parent) + init_sched_groups_power(i, sd); + } + + ret = 0; error: __free_domain_allocs(&d, alloc_state, cpu_map); > Regards Preeti U. Murthy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/