On Fri, Jul 28, 2017 at 01:16:24PM +0100, Dietmar Eggemann wrote:
> >> IIRC the topology you had in mind was MC + DIE level with n (n > 2) DIE
> >> level sched groups.
> > 
> > That'd be a NUMA box?
> 
> I don't think it's NUMA. SD level are MC, DIE w/ # DIE sg's >> 2.

Ah, I can't read. I thought >2 DIEs.

> > So this is 4 * 18 * 2 = 144 cpus:
> 
> Impressive ;-)

Takes forever to boot though :/

> > If I then start a 3rd loop, I see 100% 50%,50%. I then kill the 100%.
> > Then instantly they balance and I get 2x100% back.
> 
> Yeah, could reproduce on IVB-EP (2x10x2).

OK, I have one of those. What should I do, because I didn't actually see
anything odd.

> > Anything else I need to reproduce? (other than maybe a slightly less
> > insane machine :-)
> 
> I guess what Jeff is trying to avoid is that 'busiest->load_per_task'
> lowered to 'sds->avg_load' in case of an imbalanced busiest sg:
> 
>   if (busiest->group_type == group_imbalanced)
>     busiest->load_per_task = min(busiest->load_per_task, sds->avg_load);
> 
> is so low that later fix_small_imbalance() won't be called and
> 'env->imbalance' stays so low that load-balance of on 50% task to the
> now idle cpu won't happen.
> 
>   if (env->imbalance < busiest->load_per_task)
>     fix_small_imbalance(env, sds);
> 
> Having really a lot of otherwise idle DIE sg's helps to keep
> 'sds->avg_load' low in comparison to 'busiest->load_per_task'.

Right, but the whole load_per_task thing is a bit wonky, and since
that's the basis of fix_small_imbalance() I'm very suspect.


Reply via email to