On Tue, 20 Oct 2020, Huang, Ying wrote:

> >> =========================================================================================
> >> compiler/cpufreq_governor/kconfig/rootfs/runtime/size/tbox_group/test/testcase/ucode:
> >>   
> >> gcc-9/performance/x86_64-rhel-8.3/debian-10.4-x86_64-20200603.cgz/300s/1T/lkp-skl-fpga01/lru-shm/vm-scalability/0x2006906
> >> 
> >> commit: 
> >>   dcdf11ee14 ("mm, shmem: add vmstat for hugepage fallback")
> >>   85b9f46e8e ("mm, thp: track fallbacks due to failed memcg charges 
> >> separately")
> >> 
> >> dcdf11ee14413332 85b9f46e8ea451633ccd60a7d8c 
> >> ---------------- --------------------------- 
> >>        fail:runs  %reproduction    fail:runs
> >>            |             |             |    
> >>           1:4           24%           2:4     
> >> perf-profile.calltrace.cycles-pp.sync_regs.error_entry.do_access
> >>           3:4           53%           5:4     
> >> perf-profile.calltrace.cycles-pp.error_entry.do_access
> >>           9:4          -27%           8:4     
> >> perf-profile.children.cycles-pp.error_entry
> >>           4:4          -10%           4:4     
> >> perf-profile.self.cycles-pp.error_entry
> >>          %stddev     %change         %stddev
> >>              \          |                \  
> >>     477291            -9.1%     434041        vm-scalability.median
> >>   49791027            -8.7%   45476799        vm-scalability.throughput
> >>     223.67            +1.6%     227.36        
> >> vm-scalability.time.elapsed_time
> >>     223.67            +1.6%     227.36        
> >> vm-scalability.time.elapsed_time.max
> >>      50364 ±  6%     +24.1%      62482 ± 10%  
> >> vm-scalability.time.involuntary_context_switches
> >>       2237            +7.8%       2412        
> >> vm-scalability.time.percent_of_cpu_this_job_got
> >>       3084           +18.2%       3646        
> >> vm-scalability.time.system_time
> >>       1921            -4.2%       1839        vm-scalability.time.user_time
> >>      13.68            +2.2       15.86        mpstat.cpu.all.sys%
> >>      28535 ± 30%     -47.0%      15114 ± 79%  
> >> numa-numastat.node0.other_node
> >>     142734 ± 11%     -19.4%     115000 ± 17%  numa-meminfo.node0.AnonPages
> >>      11168 ±  3%      +8.8%      12150 ±  5%  numa-meminfo.node1.PageTables
> >>      76.00            -1.6%      74.75        vmstat.cpu.id
> >>       3626            -1.9%       3555        vmstat.system.cs
> >>    2214928 ±166%     -96.6%      75321 ±  7%  cpuidle.C1.usage
> >>     200981 ±  7%     -18.0%     164861 ±  7%  cpuidle.POLL.time
> >>      52675 ±  3%     -16.7%      43866 ± 10%  cpuidle.POLL.usage
> >>      35659 ± 11%     -19.4%      28754 ± 17%  
> >> numa-vmstat.node0.nr_anon_pages
> >>    1248014 ±  3%     +10.9%    1384236        numa-vmstat.node1.nr_mapped
> >>       2722 ±  4%     +10.6%       3011 ±  5%  
> >> numa-vmstat.node1.nr_page_table_pages
> >
> > I'm not sure that I'm reading this correctly, but I suspect that this just 
> > happens because of NUMA: memory affinity will obviously impact 
> > vm-scalability.throughput quite substantially, but I don't think the 
> > bisected commit can be to be blame.  Commit 85b9f46e8ea4 ("mm, thp: track 
> > fallbacks due to failed memcg charges separately") simply adds new 
> > count_vm_event() calls in a couple areas to track thp fallback due to 
> > memcg limits separate from fragmentation.
> >
> > It's likely a question about the testing methodology in general: for 
> > memory intensive benchmarks, I suggest it is configured in a manner that 
> > we can expect consistent memory access latency at the hardware level when 
> > running on a NUMA system.
> 
> So you think it's better to bind processes to NUMA node or CPU?  But we
> want to use this test case to capture NUMA/CPU placement/balance issue
> too.
> 

No, because binding to a specific socket may cause other performance 
"improvements" or "degradations" depending on how fragmented local memory 
is, or whether or not it's under memory pressure.  Is the system rebooted 
before testing so that we have a consistent state of memory availability 
and fragmentation across sockets?

> 0day solve the problem in another way.  We run the test case
> multiple-times and calculate the average and standard deviation, then
> compare.
> 

Depending on fragmentation or memory availability, any benchmark that 
assesses performance may be adversely affected if its results can be 
impacted by hugepage backing.

Reply via email to