> -----Original Message-----
> From: Meelis Roos [mailto:mr...@linux.ee]
> Sent: Thursday, February 4, 2021 12:58 AM
> To: Song Bao Hua (Barry Song) <song.bao....@hisilicon.com>;
> valentin.schnei...@arm.com; vincent.guit...@linaro.org; mgor...@suse.de;
> mi...@kernel.org; pet...@infradead.org; dietmar.eggem...@arm.com;
> morten.rasmus...@arm.com; linux-kernel@vger.kernel.org
> Cc: linux...@openeuler.org; xuwei (O) <xuw...@huawei.com>; Liguozhu (Kenneth)
> <liguo...@hisilicon.com>; tiantao (H) <tiant...@hisilicon.com>; wanghuiqiang
> <wanghuiqi...@huawei.com>; Zengtao (B) <prime.z...@hisilicon.com>; Jonathan
> Cameron <jonathan.came...@huawei.com>; guodong...@linaro.org
> Subject: Re: [PATCH v2] sched/topology: fix the issue groups don't span
> domain->span for NUMA diameter > 2
> 
> 03.02.21 13:12 Barry Song wrote:
> > kernel/sched/topology.c | 85 +++++++++++++++++++++++++----------------
> >   1 file changed, 53 insertions(+), 32 deletions(-)
> >
> > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> > index 5d3675c7a76b..964ed89001fe 100644
> > --- a/kernel/sched/topology.c
> > +++ b/kernel/sched/topology.c
> 
> This one still works on the Sun X4600-M2, on top of 
> v5.11-rc6-55-g3aaf0a27ffc2.
> 
> 
> Performance-wise - is the some simple benhmark to run to meaure the impact?
> Compared to what - 5.10.0 or the kernel with the warning?

Hi Meelis,
Thanks for retesting.

Comparing to the kernel with the warning is enough. As I mentioned here:
https://lore.kernel.org/lkml/20210115203632.34396-1-song.bao....@hisilicon.com/

I have seen two major issues the broken sched_group has:

* in load_balance() and find_busiest_group()
kernel is calculating the avg_load and group_type by:

sum(load of cpus within sched_domain)
------------------------------------
capacity of the whole sched_group

since sched_group isn't a subset of sched_domain, so the load of
the problematic group is severely underestimated.

sched_domain

  +----------------------------------+
  |                                  |
  |          +-------------------------------------------+
  |          | +-------+  +------+   |                   |
  |          | | cpu0  |  | cpu1 |   |                   |
  |          | +-------+  +------+   |                   |
  +----------------------------------+                   |
             |                                           |
             |      +-------+      +-------+             |
             |      |cpu2   |      |cpu3   |             |
             |      +-------+      +-------+             |
             |                                           |
             +-------------------------------------------+
                            problematic  sched_group


For the above example, kernel will divide "the sum load of
cpu0 and cpu1" by "the capacity of the whole group including
cpu0,1,2 and 3".

* in select_task_rq_fair() and find_idlest_group()
Kernel could push a forked/exec-ed task to the outside of the
sched_domain, but still inside the sched_group. For the above
diagram, while kernel wants to find the idlest cpu in the
sched_domain, it can result in picking cpu2 or cpu3.

I guess these two issues can potentially affect many benchmarks.
Our team have seen 5% unixbench score increase with the fix in
some machines though the real impact might be case-by-case.

> 
> drop caches and time the build time of linux kernel with make -j64?
> 
> --
> Meelis Roos

Thanks
Barry

Reply via email to