On 14/09/20 11:03, Vincent Guittot wrote: > The 25% default imbalance threshold for DIE and NUMA domain is large > enough to generate significant unfairness between threads. A typical > example is the case of 11 threads running on 2x4 CPUs. The imbalance of > 20% between the 2 groups of 4 cores is just low enough to not trigger > the load balance between the 2 groups. We will have always the same 6 > threads on one group of 4 CPUs and the other 5 threads on the other > group of CPUS. With a fair time sharing in each group, we ends up with > +20% running time for the group of 5 threads. >
AIUI this is the culprit: if (100 * busiest->avg_load <= env->sd->imbalance_pct * local->avg_load) goto out_balanced; As in your case imbalance_pct=120 becomes the tipping point. Now, ultimately this would need to scale based on the underlying topology, right? If you have a system with 2x32 cores running {33 threads, 34 threads}, the tipping point becomes imbalance_pct≈103; but then since you have this many more cores, it is somewhat questionable. > Consider decreasing the imbalance threshold for overloaded case where we > use the load to balance task and to ensure fair time sharing. > > Signed-off-by: Vincent Guittot <vincent.guit...@linaro.org> > --- > kernel/sched/topology.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c > index 9079d865a935..1a84b778755d 100644 > --- a/kernel/sched/topology.c > +++ b/kernel/sched/topology.c > @@ -1337,7 +1337,7 @@ sd_init(struct sched_domain_topology_level *tl, > .min_interval = sd_weight, > .max_interval = 2*sd_weight, > .busy_factor = 32, > - .imbalance_pct = 125, > + .imbalance_pct = 117, > > .cache_nice_tries = 0,