Hi Steve, On 09/11/2018 12:50, Steve Sistare wrote: > From: Steve Sistare <steve.sist...@oracle.com> > > Define and initialize a sparse bitmap of overloaded CPUs, per > last-level-cache scheduling domain, for use by the CFS scheduling class. > Save a pointer to cfs_overload_cpus in the rq for efficient access. > > Signed-off-by: Steve Sistare <steven.sist...@oracle.com> > --- > include/linux/sched/topology.h | 1 + > kernel/sched/sched.h | 2 ++ > kernel/sched/topology.c | 21 +++++++++++++++++++-- > 3 files changed, 22 insertions(+), 2 deletions(-) > > diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h > index 6b99761..b173a77 100644 > --- a/include/linux/sched/topology.h > +++ b/include/linux/sched/topology.h > @@ -72,6 +72,7 @@ struct sched_domain_shared { > atomic_t ref; > atomic_t nr_busy_cpus; > int has_idle_cores; > + struct sparsemask *cfs_overload_cpus;
Thinking about misfit stealing, we can't use the sd_llc_shared's because on big.LITTLE misfit migrations happen across LLC domains. I was thinking of adding a misfit sparsemask to the root_domain, but then I thought we could do the same thing for cfs_overload_cpus. By doing so we'd have a single source of information for overloaded CPUs, and we could filter that down during idle balance - you mentioned earlier wanting to try stealing at each SD level. This would also let you get rid of [PATCH 02]. The main part of try_steal() could then be written down as something like this: ----->8----- for_each_domain(this_cpu, sd) { span = sched_domain_span(sd) for_each_sparse_wrap(src_cpu, overload_cpus) { if (cpumask_test_cpu(src_cpu, span) && steal_from(dts_rq, dst_rf, &locked, src_cpu)) { stolen = 1; goto out; } } } ------8<----- We could limit the stealing to stop at the highest SD_SHARE_PKG_RESOURCES domain for now so there would be no behavioural change - but we'd factorize the #ifdef SCHED_SMT bit. Furthermore, the door would be open to further stealing. What do you think? [...]