On 14/12/2018 16:01, Vincent Guittot wrote: > When check_asym_packing() is triggered, the imbalance is set to : > busiest_stat.avg_load * busiest_stat.group_capacity / SCHED_CAPACITY_SCALE > busiest_stat.avg_load also comes from a division and the final rounding > can make imbalance slightly lower than the weighted load of the cfs_rq. > But this is enough to skip the rq in find_busiest_queue and prevents asym > migration to happen. > > Add 1 to the avg_load to make sure that the targeted cpu will not be > skipped unexpectidly. > > Signed-off-by: Vincent Guittot <vincent.guit...@linaro.org> > --- > kernel/sched/fair.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index ca46964..c215f7a 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -8215,6 +8215,12 @@ static inline void update_sg_lb_stats(struct lb_env > *env, > /* Adjust by relative CPU capacity of the group */ > sgs->group_capacity = group->sgc->capacity; > sgs->avg_load = (sgs->group_load*SCHED_CAPACITY_SCALE) / > sgs->group_capacity; > + /* > + * Prevent division rounding to make the computation of imbalance > + * slightly less than original value and to prevent the rq to be then > + * selected as busiest queue > + */ > + sgs->avg_load += 1;
I've tried investigating this off-by-one issue by running lmbench, but it seems to be a gnarly one. The adventure starts in update_sg_lb_stats() where we compute sgs->avg_load: sgs->avg_load = (sgs->group_load*SCHED_CAPACITY_SCALE) / sgs->group_capacity Then we drop by find_busiest_group() and call check_asym_packing() which, if we do need to pack, does: env->imbalance = DIV_ROUND_CLOSEST( sds->busiest_stat.avg_load * sds->busiest_stat.group_capacity, SCHED_CAPACITY_SCALE); And finally the one check that seems to be failing, and that you're addressing with a +1 is in find_busiest_queue(): if (rq->nr_running == 1 && wl > env->imbalance && !check_cpu_capacity(rq, env->sd)) continue; Now, running lmbench on my HiKey960 hacked up to use asym packing, I get an example where there's a task that should be migrated via asym packing but isn't: # This a DIE level balance update_sg_lb_stats(): cpu=0 load=1 cpu=1 load=1023 cpu=2 load=0 cpu=3 load=0 avg_load=568 # Busiest group is [0-3] find_busiest_group(): check_asym_packing(): env->imbalance = 1022 find_busiest_queue(): cpu=0 wl=1 cpu=1 wl=1023 cpu=2 wl=0 cpu=3 wl=0 Only CPU1 has rq->nr_running > 0 (==1 actually), but wl > env->imbalance so we skip it in find_busiest_queue(). Now, I thought it was due to IRQ/rt pressure - 1840 isn't the maximum group capacity for the LITTLE cluster, it should be (463 * 4) == 1852. I got curious and threw random group capacity values in that equation while keeping the same avg_load [1], and it gets wild: you have a signal that oscillates between 1024 and 1014! [1]: https://gist.github.com/valschneider/f42252e83be943d276b901f57734b8ba Adding +1 to avg_load doesn't solve those pesky rounding errors that get worse as group_capacity grows, but it reverses the trend: the signal oscillates above 1024. So in the end a +1 *could* "fix" this, but a) It'd make more sense to have it in the check_asym_packing() code b) It's ugly and it makes me feel like this piece of code is flimsy AF. > > if (sgs->sum_nr_running) > sgs->load_per_task = sgs->sum_weighted_load / > sgs->sum_nr_running; >