Hi Vincent, About time I had a look at this one...
On 14/12/2018 16:01, Vincent Guittot wrote: > In case of active balance, we increase the balance interval to cover > pinned tasks cases not covered by all_pinned logic. AFAIUI the balance increase is there to have plenty of time to stop the task before going through another load_balance(). Seeing as there is a cpus_allowed check that leads to env.flags |= LBF_ALL_PINNED; goto out_one_pinned; in the active balancing part of load_balance(), the condition you're changing should never be hit when we have pinned tasks. So you may want to rephrase that bit. > Neverthless, the active > migration triggered by asym packing should be treated as the normal > unbalanced case and reset the interval to default value otherwise active > migration for asym_packing can be easily delayed for hundreds of ms > because of this all_pinned detection mecanism. Mmm so it's not exactly clear why we need this change. If we double the interval because of a pinned task we wanted to active balance, well it's just regular pinned task issues and we can't do much about it. The only scenario I can think of is if you had a workload where you wanted to do an active balance in several successive load_balance(), in which case you would keep increasing the interval even though you do migrate a task every time (which would harm every subsequent active_balance). In that case every active_balance "user" (pressured CPU, misfit) is affected, so maybe what we want is something like this: -----8<----- @@ -9136,13 +9149,11 @@ static int load_balance(int this_cpu, struct rq *this_rq, sd->balance_interval = sd->min_interval; } else { /* - * If we've begun active balancing, start to back off. This - * case may not be covered by the all_pinned logic if there - * is only 1 task on the busy runqueue (because we don't call - * detach_tasks). + * If we've begun active balancing, start to back off. + * Don't increase too much in case we have more active balances + * coming up. */ - if (sd->balance_interval < sd->max_interval) - sd->balance_interval *= 2; + sd->balance_interval = 2 * sd->min_interval; } goto out; ----->8----- Maybe we want a larger threshold - truth be told, it all depends on how long the cpu stopper can take and if that delay increase is still relevant nowadays. > > Signed-off-by: Vincent Guittot <vincent.guit...@linaro.org> > --- > kernel/sched/fair.c | 27 +++++++++++++++------------ > 1 file changed, 15 insertions(+), 12 deletions(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 9591e7a..487287e 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -8857,21 +8857,24 @@ static struct rq *find_busiest_queue(struct lb_env > *env, > */ > #define MAX_PINNED_INTERVAL 512 > > +static inline bool > +asym_active_balance(struct lb_env *env) > +{ > + /* > + * ASYM_PACKING needs to force migrate tasks from busy but > + * lower priority CPUs in order to pack all tasks in the > + * highest priority CPUs. > + */ > + return env->idle != CPU_NOT_IDLE && (env->sd->flags & SD_ASYM_PACKING) > && > + sched_asym_prefer(env->dst_cpu, env->src_cpu); > +} > + > static int need_active_balance(struct lb_env *env) > { > struct sched_domain *sd = env->sd; > > - if (env->idle != CPU_NOT_IDLE) { > - > - /* > - * ASYM_PACKING needs to force migrate tasks from busy but > - * lower priority CPUs in order to pack all tasks in the > - * highest priority CPUs. > - */ > - if ((sd->flags & SD_ASYM_PACKING) && > - sched_asym_prefer(env->dst_cpu, env->src_cpu)) > - return 1; > - } > + if (asym_active_balance(env)) > + return 1; > > /* > * The dst_cpu is idle and the src_cpu CPU has only 1 CFS task. > @@ -9150,7 +9153,7 @@ static int load_balance(int this_cpu, struct rq > *this_rq, > } else > sd->nr_balance_failed = 0; > > - if (likely(!active_balance)) { > + if (likely(!active_balance) || asym_active_balance(&env)) { AFAICT this makes us reset the interval whenever we do an asym packing active balance (if the task is pinned we would goto out_one_pinned). This goes against the logic I had understood so far and that I explained higher up. > /* We were unbalanced, so reset the balancing interval */ > sd->balance_interval = sd->min_interval; > } else { >