topology: the group balance cpu must be a cpu where the group is installed

Lauro Venancio Mon, 24 Apr 2017 08:12:50 -0700

On 04/24/2017 10:03 AM, Peter Zijlstra wrote:
> On Thu, Apr 20, 2017 at 04:51:43PM -0300, Lauro Ramos Venancio wrote:
>
>> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
>> index e77c93a..694e799 100644
>> --- a/kernel/sched/topology.c
>> +++ b/kernel/sched/topology.c
>> @@ -505,7 +507,11 @@ static void build_group_mask(struct sched_domain *sd, 
>> struct sched_group *sg)
>>  
>>      for_each_cpu(i, sg_span) {
>>              sibling = *per_cpu_ptr(sdd->sd, i);
>> -            if (!cpumask_test_cpu(i, sched_domain_span(sibling)))
>> +
>> +            if (!sibling->groups)
>> +                    continue;
> How can this happen?
This happens on machines with asymmetric topologies. For more details see:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c1174876874dcf8986806e4dad3d7d07af20b439
>
>> +
>> +            if (!cpumask_equal(sg_span, sched_group_cpus(sibling->groups)))
>>                      continue;
>>  
>>              cpumask_set_cpu(i, sched_group_mask(sg));
>
>> @@ -1482,6 +1502,14 @@ struct sched_domain *build_sched_domain(struct 
>> sched_domain_topology_level *tl,
>>              }
>>      }
>>  
>> +    /* Init overlap groups */
>> +    for_each_cpu(i, cpu_map) {
>> +            for (sd = *per_cpu_ptr(d.sd, i); sd; sd = sd->parent) {
>> +                    if (sd->flags & SD_OVERLAP)
>> +                            init_overlap_sched_groups(sd);
>> +            }
>> +    }
> Why does this have to be a whole new loop? This is because in
> build_group_mask() we could encounter @sibling that were not constructed
> yet?
That is right. We can only build the group mask when all siblings are
constructed.


>
> So this is the primary fix?
Yes.
>
>> +
>>      /* Calculate CPU capacity for physical packages and nodes */
>>      for (i = nr_cpumask_bits-1; i >= 0; i--) {
>>              if (!cpumask_test_cpu(i, cpu_map))
>
> Also, would it not make sense to re-order patch 2 to come after this,
> such that we _do_ have the group_mask available and don't have to jump
> through hoops in order to link up the sgc? Afaict we don't actually use
> the sgc until the above (reverse) loop computing the CPU capacities.

Re: [PATCH 4/4] sched/topology: the group balance cpu must be a cpu where the group is installed

Reply via email to