On 18/12/2018 13:23, Vincent Guittot wrote: [...] >> Ah, I think I get it: you're saying that this balance_interval increase >> is done because it is always assumed we do an active balance with >> busiest->nr_running > 1 && pinned tasks, and that all that is left to >> migrate after the active_balance is pinned tasks that we can't actually >> migrate. >> >> Maybe that's what we should target (as always, totally untested): >> >> -----8<----- >> @@ -9131,18 +9144,16 @@ static int load_balance(int this_cpu, struct rq >> *this_rq, >> } else >> sd->nr_balance_failed = 0; >> >> - if (likely(!active_balance)) { >> - /* We were unbalanced, so reset the balancing interval */ >> - sd->balance_interval = sd->min_interval; >> - } else { >> + if (unlikely(active_balance && (env.flags & LBF_ALL_PINNED))) { > > So it's not exactly LBF_ALL_PINNED but also some pinned tasks >
Wouldn't LBF_ALL_PINNED cover all relevant cases? It's set at the very top of the 'if (busiest->nr_running > 1)' block and cleared whenever we find at least one task we could pull, even if we don't pull it because of other reasons in can_migrate_task() (e.g. cache hotness). If we have LBF_SOME_PINNED but not LBF_ALL_PINNED, we currently don't increase the balance_interval, which is what we would want to maintain. > but IIUC, you would like to extend the reset of balance interval to > all the "good" reasons to trig active load balance Right > In fact the condition used in need_active_balance except the last > remaining one. Good reasons are: > - asym packing > - /* > * The dst_cpu is idle and the src_cpu CPU has only 1 CFS task. > * It's worth migrating the task if the src_cpu's capacity is reduced > * because of other sched_class or IRQs if more capacity stays > * available on dst_cpu. > */ > - misfit task > > I can extend the test for these conditions So that's all the need_active_balance() cases except the last sd->nr_balance_failed one. I'd argue this could also be counted as a "good" reason to active balance which shouldn't lead to a balance_interval increase. Plus, it keeps to the logic of increasing the balance_interval only when task affinity gets in the way.