Re: [PATCH] sched/topology: fix the issue groups don't span domain->span for NUMA diameter > 2

Valentin Schneider Tue, 02 Feb 2021 07:22:25 -0800

On 01/02/21 16:38, Barry Song wrote:
> @@ -964,6 +941,12 @@ static void init_overlap_sched_group(struct sched_domain 
> *sd,
>
>       build_balance_mask(sd, sg, mask);
>       cpu = cpumask_first_and(sched_group_span(sg), mask);
> +     /*
> +      * for the group generated by grandchild, use the sgc of 2nd cpu
> +      * because the 1st cpu might be used by another sched_group
> +      */
> +     if (from_grandchild && cpumask_weight(mask) > 1)
> +             cpu = cpumask_next_and(cpu, sched_group_span(sg), mask);
>
>       sg->sgc = *per_cpu_ptr(sdd->sgc, cpu);


So you are getting a (hopefully) unique ID for this group span at this
given topology level (i.e. sd->private) but as I had stated in that list of
issues, this creates an sgc that isn't attached to the local group of any
sched_domain, and thus won't get its capacity values updated.

This can actually be seen via the capacity values you're getting at build
time:

> [    0.868907] CPU0 attaching sched-domain(s):
...
> [    0.869542]    domain-2: span=0-5 level=NUMA
> [    0.869559]     groups: 0:{ span=0-3 cap=4002 }, 5:{ span=4-5 cap=2048 }
                                                          ^^^^^^^^^^^^^^^^
> [    0.871177] CPU4 attaching sched-domain(s):
...
> [    0.871200]   groups: 4:{ span=4 cap=977 }, 5:{ span=5 cap=1001 }
> [    0.871243]   domain-1: span=4-7 level=NUMA
> [    0.871257]    groups: 4:{ span=4-5 cap=1978 }, 6:{ span=6-7 cap=1968 }
                                ^^^^^^^^^^^^^^^^

IMO what we want to do here is to hook this CPU0-domain-2-group5 to the sgc
of CPU4-domain1-group4. I've done that in the below diff - this gives us
groups with sgc's owned at lower topology levels, but this will only ever
be true for non-local groups. This has the added benefit of working with
single-CPU nodes. Briefly tested on your topology and the sunfire's (via
QEMU), and I didn't get screamed at.

Before the fun police comes and impounds my keyboard, I'd like to point out
that we could leverage this cross-level sgc referencing hack to further
change the NUMA domains and pretty much get rid of overlapping groups
(that's what I was fumbling with in [1]).

[1]: http://lore.kernel.org/r/jhjwnw11ak2.mog...@arm.com

That is, rather than building overlapping groups and fixing them whenever
that breaks (distance > 2), we could have:
- the local group being the child domain's span (as always)
- all non-local NUMA groups spanning a single node each, with the right sgc
  cross-referencing.

Thoughts?

--->8---
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index b748999c9e11..ef43abb6b1fb 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -932,21 +932,15 @@ build_group_from_child_sched_domain(struct sched_domain 
*sd, int cpu)
 
 static void init_overlap_sched_group(struct sched_domain *sd,
                                     struct sched_group *sg,
-                                    int from_grandchild)
+                                    struct sched_domain *grandchild)
 {
        struct cpumask *mask = sched_domains_tmpmask2;
-       struct sd_data *sdd = sd->private;
+       struct sd_data *sdd = grandchild ? grandchild->private : sd->private;
        struct cpumask *sg_span;
        int cpu;
 
        build_balance_mask(sd, sg, mask);
        cpu = cpumask_first_and(sched_group_span(sg), mask);
-       /*
-        * for the group generated by grandchild, use the sgc of 2nd cpu
-        * because the 1st cpu might be used by another sched_group
-        */
-       if (from_grandchild && cpumask_weight(mask) > 1)
-               cpu = cpumask_next_and(cpu, sched_group_span(sg), mask);
 
        sg->sgc = *per_cpu_ptr(sdd->sgc, cpu);
        if (atomic_inc_return(&sg->sgc->ref) == 1)
@@ -979,7 +973,7 @@ build_overlap_sched_groups(struct sched_domain *sd, int cpu)
 
        for_each_cpu_wrap(i, span, cpu) {
                struct cpumask *sg_span;
-               int from_grandchild = 0;
+               bool from_grandchild = false;
 
                if (cpumask_test_cpu(i, covered))
                        continue;
@@ -1033,7 +1027,7 @@ build_overlap_sched_groups(struct sched_domain *sd, int 
cpu)
                       !cpumask_subset(sched_domain_span(sibling->child),
                                       span)) {
                        sibling = sibling->child;
-                       from_grandchild = 1;
+                       from_grandchild = true;
                }
 
                sg = build_group_from_child_sched_domain(sibling, cpu);
@@ -1043,7 +1037,7 @@ build_overlap_sched_groups(struct sched_domain *sd, int 
cpu)
                sg_span = sched_group_span(sg);
                cpumask_or(covered, covered, sg_span);
 
-               init_overlap_sched_group(sd, sg, from_grandchild);
+               init_overlap_sched_group(sd, sg, from_grandchild ? sibling : 
NULL);
 
                if (!first)
                        first = sg;

Re: [PATCH] sched/topology: fix the issue groups don't span domain->span for NUMA diameter > 2

Reply via email to