On Fri, May 14, 2021 at 07:14:15PM -0700, Ricardo Neri wrote: > On Fri, May 14, 2021 at 11:47:45AM +0200, Peter Zijlstra wrote: > > On Thu, May 13, 2021 at 08:49:08AM -0700, Ricardo Neri wrote: > > > include/linux/sched/topology.h | 1 + > > > kernel/sched/fair.c | 101 +++++++++++++++++++++++++++++++++ > > > 2 files changed, 102 insertions(+) > > > > > > diff --git a/include/linux/sched/topology.h > > > b/include/linux/sched/topology.h > > > index 8f0f778b7c91..43bdb8b1e1df 100644 > > > --- a/include/linux/sched/topology.h > > > +++ b/include/linux/sched/topology.h > > > @@ -57,6 +57,7 @@ static inline int cpu_numa_flags(void) > > > #endif > > > > > > extern int arch_asym_cpu_priority(int cpu); > > > +extern bool arch_asym_check_smt_siblings(void); > > > > > > struct sched_domain_attr { > > > int relax_domain_level; > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > > index c8b66a5d593e..3d6cc027e6e6 100644 > > > --- a/kernel/sched/fair.c > > > +++ b/kernel/sched/fair.c > > > @@ -106,6 +106,15 @@ int __weak arch_asym_cpu_priority(int cpu) > > > return -cpu; > > > } > > > > > > +/* > > > + * For asym packing, first check the state of SMT siblings before > > > deciding to > > > + * pull tasks. > > > + */ > > > +bool __weak arch_asym_check_smt_siblings(void) > > > +{ > > > + return false; > > > +} > > > + > > > /* > > > * The margin used when comparing utilization with CPU capacity. > > > * > > > > > @@ -8458,6 +8550,9 @@ sched_asym(struct lb_env *env, struct sd_lb_stats > > > *sds, struct sg_lb_stats *sgs > > > if (group == sds->local) > > > return false; > > > > > > + if (arch_asym_check_smt_siblings()) > > > + return asym_can_pull_tasks(env->dst_cpu, sds, sgs, group); > > > + > > > return sched_asym_prefer(env->dst_cpu, group->asym_prefer_cpu); > > > } > > > > So I'm thinking that this is a property of having ASYM_PACKING at a core > > level, rather than some arch special. Wouldn't something like this be > > more appropriate? > > > > --- > > --- a/include/linux/sched/topology.h > > +++ b/include/linux/sched/topology.h > > @@ -57,7 +57,6 @@ static inline int cpu_numa_flags(void) > > #endif > > > > extern int arch_asym_cpu_priority(int cpu); > > -extern bool arch_asym_check_smt_siblings(void); > > > > struct sched_domain_attr { > > int relax_domain_level; > > --- a/kernel/sched/fair.c > > +++ b/kernel/sched/fair.c > > @@ -107,15 +107,6 @@ int __weak arch_asym_cpu_priority(int cp > > } > > > > /* > > - * For asym packing, first check the state of SMT siblings before deciding > > to > > - * pull tasks. > > - */ > > -bool __weak arch_asym_check_smt_siblings(void) > > -{ > > - return false; > > -} > > - > > -/* > > * The margin used when comparing utilization with CPU capacity. > > * > > * (default: ~20%) > > @@ -8550,7 +8541,8 @@ sched_asym(struct lb_env *env, struct sd > > if (group == sds->local) > > return false; > > > > - if (arch_asym_check_smt_siblings()) > > + if ((sds->local->flags & SD_SHARE_CPUCAPACITY) || > > + (group->flags & SD_SHARE_CPUCAPACITY)) > > return asym_can_pull_tasks(env->dst_cpu, sds, sgs, group); > > Thanks Peter for the quick review! This makes sense to me. The only > reason we proposed arch_asym_check_smt_siblings() is because we were > about breaking powerpc (I need to study how they set priorities for SMT, > if applicable). If you think this is not an issue I can post a > v4 with this update.
As far as I can see, priorities in powerpc are set by the CPU number. However, I am not sure how CPUs are enumerated? If CPUs in brackets are SMT sibling, Does an enumeration looks like A) [0, 1], [2, 3] or B) [0, 2], [1, 3]? I guess B is the right answer. Otherwise, both SMT siblings of a core would need to be busy before a new core is used. Still, I think the issue described in the cover letter may be reproducible in powerpc as well. If CPU3 is offlined, and [0, 2] pulled tasks from [1, -] so that both CPU0 and CPU2 become busy, CPU1 would not be able to help since CPU0 has the highest priority. I am cc'ing the linuxppc list to get some feedback. Thanks and BR, Ricardo