> -----Original Message----- > From: Meelis Roos [mailto:mr...@linux.ee] > Sent: Thursday, February 4, 2021 12:58 AM > To: Song Bao Hua (Barry Song) <song.bao....@hisilicon.com>; > valentin.schnei...@arm.com; vincent.guit...@linaro.org; mgor...@suse.de; > mi...@kernel.org; pet...@infradead.org; dietmar.eggem...@arm.com; > morten.rasmus...@arm.com; linux-kernel@vger.kernel.org > Cc: linux...@openeuler.org; xuwei (O) <xuw...@huawei.com>; Liguozhu (Kenneth) > <liguo...@hisilicon.com>; tiantao (H) <tiant...@hisilicon.com>; wanghuiqiang > <wanghuiqi...@huawei.com>; Zengtao (B) <prime.z...@hisilicon.com>; Jonathan > Cameron <jonathan.came...@huawei.com>; guodong...@linaro.org > Subject: Re: [PATCH v2] sched/topology: fix the issue groups don't span > domain->span for NUMA diameter > 2 > > 03.02.21 13:12 Barry Song wrote: > > kernel/sched/topology.c | 85 +++++++++++++++++++++++++---------------- > > 1 file changed, 53 insertions(+), 32 deletions(-) > > > > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c > > index 5d3675c7a76b..964ed89001fe 100644 > > --- a/kernel/sched/topology.c > > +++ b/kernel/sched/topology.c > > This one still works on the Sun X4600-M2, on top of > v5.11-rc6-55-g3aaf0a27ffc2. > > > Performance-wise - is the some simple benhmark to run to meaure the impact? > Compared to what - 5.10.0 or the kernel with the warning?
Hi Meelis, Thanks for retesting. Comparing to the kernel with the warning is enough. As I mentioned here: https://lore.kernel.org/lkml/20210115203632.34396-1-song.bao....@hisilicon.com/ I have seen two major issues the broken sched_group has: * in load_balance() and find_busiest_group() kernel is calculating the avg_load and group_type by: sum(load of cpus within sched_domain) ------------------------------------ capacity of the whole sched_group since sched_group isn't a subset of sched_domain, so the load of the problematic group is severely underestimated. sched_domain +----------------------------------+ | | | +-------------------------------------------+ | | +-------+ +------+ | | | | | cpu0 | | cpu1 | | | | | +-------+ +------+ | | +----------------------------------+ | | | | +-------+ +-------+ | | |cpu2 | |cpu3 | | | +-------+ +-------+ | | | +-------------------------------------------+ problematic sched_group For the above example, kernel will divide "the sum load of cpu0 and cpu1" by "the capacity of the whole group including cpu0,1,2 and 3". * in select_task_rq_fair() and find_idlest_group() Kernel could push a forked/exec-ed task to the outside of the sched_domain, but still inside the sched_group. For the above diagram, while kernel wants to find the idlest cpu in the sched_domain, it can result in picking cpu2 or cpu3. I guess these two issues can potentially affect many benchmarks. Our team have seen 5% unixbench score increase with the fix in some machines though the real impact might be case-by-case. > > drop caches and time the build time of linux kernel with make -j64? > > -- > Meelis Roos Thanks Barry