On 9/30/19 1:46 AM, kernel test robot wrote:
Greeting,
FYI, we noticed a -19.6% regression of stress-ng.madvise.ops_per_sec due to
commit:
commit: 87eaceb3faa59b9b4d940ec9554ce251325d83fe ("mm: thp: make deferred split
shrinker memcg aware")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
in testcase: stress-ng
on test machine: 72 threads Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz with 192G
memory
with following parameters:
nr_threads: 100%
disk: 1HDD
testtime: 1s
class: vm
ucode: 0x200005e
cpufreq_governor: performance
Thanks for reporting the case. I did the same test on my VM (24 threads,
48GB), I saw average ~3% degradation. The test itself may have 15%
variation in the same environment according to my test, so I compared
the average result within 10 runs.
In 5.3 the deferred split queue is per node, currently it is per memcg.
If the test is run in default memcg configuration (just one root memcg),
the lock contention may get worse (two locks -> one lock).
Actually, I already noticed this issue in a different way. The patch
changed its NUMA awareness. The global kswapd may end up reclaiming THPs
from a different node.
I already came up with a patch which moves deferred split queue to
memcg->nodeinfo[] to restore NUMA awareness and keep it memcg aware at
the mean time. The same test shows average ~4% improvement and slightly
better than 5.3 with this patch.
I'm going to post the patch to the mailing list soon.
If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml
=========================================================================================
class/compiler/cpufreq_governor/disk/kconfig/nr_threads/rootfs/tbox_group/testcase/testtime/ucode:
vm/gcc-7/performance/1HDD/x86_64-rhel-7.6/100%/debian-x86_64-2019-05-14.cgz/lkp-skl-2sp8/stress-ng/1s/0x200005e
commit:
0a432dcbeb ("mm: shrinker: make shrinker not depend on memcg kmem")
87eaceb3fa ("mm: thp: make deferred split shrinker memcg aware")
0a432dcbeb32edcd 87eaceb3faa59b9b4d940ec9554
---------------- ---------------------------
%stddev %change %stddev
\ | \
6457 -19.5% 5198 stress-ng.madvise.ops
6409 -19.6% 5154 stress-ng.madvise.ops_per_sec
3575 -26.8% 2618 ± 6% stress-ng.mremap.ops
3575 -26.9% 2613 ± 6% stress-ng.mremap.ops_per_sec
15.77 -5.8% 14.85 ± 2% iostat.cpu.user
3427944 ± 4% -9.3% 3109984 meminfo.AnonPages
33658 ± 22% +69791.3% 23524535 ±165% sched_debug.cfs_rq:/.load.max
19951 ± 7% +13.3% 22611 ± 3% softirqs.CPU54.TIMER
109.94 -4.1% 105.41 turbostat.RAMWatt
5.89 ± 62% -3.1 2.78 ±173%
perf-profile.calltrace.cycles-pp.page_fault
3.39 ±101% -0.6 2.78 ±173%
perf-profile.calltrace.cycles-pp.do_page_fault.page_fault
3.39 ±101% -0.6 2.78 ±173%
perf-profile.calltrace.cycles-pp.__do_page_fault.do_page_fault.page_fault
5.28 ±100% -5.3 0.00
perf-profile.children.cycles-pp.___might_sleep
5.28 ±100% -5.3 0.00
perf-profile.self.cycles-pp.___might_sleep
885704 ± 5% -11.0% 787888 ± 6% proc-vmstat.nr_anon_pages
1.742e+08 ± 5% -9.6% 1.576e+08 ± 2% proc-vmstat.pgalloc_normal
1.741e+08 ± 5% -9.6% 1.575e+08 ± 2% proc-vmstat.pgfree
375505 -19.4% 302688 proc-vmstat.pglazyfree
55236 ± 38% -55.5% 24552 ± 41%
proc-vmstat.thp_deferred_split_page
55234 ± 38% -55.6% 24543 ± 41% proc-vmstat.thp_fault_alloc
3218 -19.4% 2595 proc-vmstat.thp_split_page
12163 ± 7% -22.1% 9473 ± 12% proc-vmstat.thp_split_pmd
8193516 +3.2% 8459146
proc-vmstat.unevictable_pgs_scanned
79085 ± 10% -7.8% 72890
interrupts.CAL:Function_call_interrupts
1139 ± 9% -10.7% 1018 ± 3%
interrupts.CPU0.CAL:Function_call_interrupts
3596 ± 3% -13.8% 3100 ± 8%
interrupts.CPU20.TLB:TLB_shootdowns
3602 ± 4% -12.6% 3149 ± 10%
interrupts.CPU23.TLB:TLB_shootdowns
3512 ± 5% -9.9% 3163 ± 9%
interrupts.CPU25.TLB:TLB_shootdowns
3512 ± 3% -12.1% 3088 ± 9%
interrupts.CPU26.TLB:TLB_shootdowns
3610 ± 5% -13.2% 3134 ± 5%
interrupts.CPU29.TLB:TLB_shootdowns
3602 ± 5% -17.4% 2973 ± 8%
interrupts.CPU31.TLB:TLB_shootdowns
3548 ± 4% -12.7% 3098 ± 6%
interrupts.CPU32.TLB:TLB_shootdowns
3637 ± 5% -15.2% 3085 ± 7%
interrupts.CPU35.TLB:TLB_shootdowns
3588 ± 3% -12.7% 3131 ± 9%
interrupts.CPU56.TLB:TLB_shootdowns
3664 ± 5% -14.3% 3142 ± 10%
interrupts.CPU59.TLB:TLB_shootdowns
3542 ± 6% -13.0% 3082 ± 5%
interrupts.CPU64.TLB:TLB_shootdowns
3539 ± 5% +12.4% 3977 ± 11%
interrupts.CPU7.TLB:TLB_shootdowns
3485 ± 5% -13.0% 3033 ± 10%
interrupts.CPU70.TLB:TLB_shootdowns
3651 ± 4% -16.1% 3062 ± 9%
interrupts.CPU71.TLB:TLB_shootdowns
1.557e+10 ± 2% -8.1% 1.431e+10 ± 3% perf-stat.i.branch-instructions
1.887e+08 ± 9% -21.4% 1.484e+08 perf-stat.i.cache-misses
5.026e+08 ± 2% -8.6% 4.595e+08 ± 4% perf-stat.i.cache-references
2609 ± 3% +6.0% 2766
perf-stat.i.cycles-between-cache-misses
7.344e+09 -6.6% 6.861e+09 ± 3% perf-stat.i.dTLB-stores
6.969e+10 -7.6% 6.44e+10 ± 3% perf-stat.i.instructions
0.37 ± 2% -7.2% 0.34 perf-stat.i.ipc
43.29 ± 5% +3.7 46.94 ± 4% perf-stat.i.node-load-miss-rate%
15474576 ± 8% -23.9% 11782653 ± 13% perf-stat.i.node-load-misses
28.50 ± 6% +3.2 31.74 ± 3% perf-stat.i.node-store-miss-rate%
26447212 ± 5% -11.6% 23382361 ± 4% perf-stat.i.node-stores
0.61 +0.0 0.65
perf-stat.overall.branch-miss-rate%
37.58 ± 8% -5.2 32.39 ± 4%
perf-stat.overall.cache-miss-rate%
2.91 ± 2% +3.4% 3.00 perf-stat.overall.cpi
1091 ± 10% +19.5% 1303 ± 3%
perf-stat.overall.cycles-between-cache-misses
0.17 +0.0 0.18 ± 3%
perf-stat.overall.dTLB-store-miss-rate%
5922 -6.6% 5533 ± 2%
perf-stat.overall.instructions-per-iTLB-miss
0.34 ± 2% -3.3% 0.33 perf-stat.overall.ipc
1.462e+10 ± 2% -5.3% 1.384e+10 ± 3% perf-stat.ps.branch-instructions
1.765e+08 ± 10% -18.7% 1.435e+08 perf-stat.ps.cache-misses
6.926e+09 -4.0% 6.648e+09 ± 3% perf-stat.ps.dTLB-stores
6.555e+10 -5.0% 6.229e+10 ± 3% perf-stat.ps.instructions
14736035 ± 8% -21.6% 11547658 ± 14% perf-stat.ps.node-load-misses
2.703e+12 -4.4% 2.585e+12 ± 2% perf-stat.total.instructions
stress-ng.madvise.ops
7000 +-+------------------------------------------------------------------+
6800 +-+ + |
| : |
6600 +-+.+ ++.+ +.+ ++ + ++. : :+.+ +. ++ .++ +. ++ + +. + |
6400 +-+ ++ ++ + +.+ + ++ + ++ + : + ++ + ++.+ + ++ :+.++|
| + + |
6200 +-+ |
6000 +-+ |
5800 +-+ |
| |
5600 +-+ |
5400 +-+ |
OO OOO O O O O O O O O O O |
5200 +-O O O OO OO OO O OOO O OO O O OOOO O O |
5000 +-+------------------------------------------------------------------+
stress-ng.madvise.ops_per_sec
7000 +-+------------------------------------------------------------------+
6800 +-+ + |
| : |
6600 +-+ : : |
6400 +-+.+++++.++++.+++++.+++++.++ ++.++++.+ +++.+++++.+++++.++++.++ ++.++|
| + + |
6200 +-+ |
6000 +-+ |
5800 +-+ |
| |
5600 +-+ |
5400 +-+ |
|O O O O O O |
5200 O-O O OOO O OO OO OOOOO OOOOO OOOO OO O O O |
5000 +-+---------O---------------------------OO---------------------------+
[*] bisect-good sample
[O] bisect-bad sample
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
Thanks,
Rong Chen