On Thu, Feb 23, 2017 at 08:35:45AM +0100, Michal Hocko wrote: > > 57.60 ± 0% -11.1% 51.20 ± 0% fsmark.files_per_sec > > 607.84 ± 0% +9.0% 662.24 ± 1% fsmark.time.elapsed_time > > 607.84 ± 0% +9.0% 662.24 ± 1% fsmark.time.elapsed_time.max > > 14317 ± 6% -12.2% 12568 ± 7% > > fsmark.time.involuntary_context_switches > > 1864 ± 0% +0.5% 1873 ± 0% > > fsmark.time.maximum_resident_set_size > > 12425 ± 0% +23.3% 15320 ± 3% fsmark.time.minor_page_faults > > 33.00 ± 3% -33.9% 21.80 ± 1% > > fsmark.time.percent_of_cpu_this_job_got > > 203.49 ± 3% -28.1% 146.31 ± 1% fsmark.time.system_time > > 605701 ± 0% +3.6% 627486 ± 0% > > fsmark.time.voluntary_context_switches > > 307106 ± 2% +20.2% 368992 ± 9% > > interrupts.CAL:Function_call_interrupts > > 183040 ± 0% +23.2% 225559 ± 3% softirqs.BLOCK > > 12203 ± 57% +236.4% 41056 ±103% softirqs.NET_RX > > 186118 ± 0% +21.9% 226922 ± 2% softirqs.TASKLET > > 14317 ± 6% -12.2% 12568 ± 7% > > time.involuntary_context_switches > > 12425 ± 0% +23.3% 15320 ± 3% time.minor_page_faults > > 33.00 ± 3% -33.9% 21.80 ± 1% > > time.percent_of_cpu_this_job_got > > 203.49 ± 3% -28.1% 146.31 ± 1% time.system_time > > 3.47 ± 3% -13.0% 3.02 ± 1% turbostat.%Busy > > 99.60 ± 1% -9.6% 90.00 ± 1% turbostat.Avg_MHz > > 78.69 ± 1% +1.7% 80.01 ± 0% turbostat.CorWatt > > 3.56 ± 61% -91.7% 0.30 ± 76% turbostat.Pkg%pc2 > > 207790 ± 0% -8.2% 190654 ± 1% vmstat.io.bo > > 30667691 ± 0% +65.9% 50890669 ± 1% vmstat.memory.cache > > 34549892 ± 0% -58.4% 14378939 ± 4% vmstat.memory.free > > 6768 ± 0% -1.3% 6681 ± 1% vmstat.system.cs > > 1.089e+10 ± 2% +13.4% 1.236e+10 ± 3% cpuidle.C1E-IVT.time > > 11475304 ± 2% +13.4% 13007849 ± 3% cpuidle.C1E-IVT.usage > > 2.7e+09 ± 6% +13.2% 3.057e+09 ± 3% cpuidle.C3-IVT.time > > 2954294 ± 6% +14.3% 3375966 ± 3% cpuidle.C3-IVT.usage > > 96963295 ± 14% +17.5% 1.139e+08 ± 12% cpuidle.POLL.time > > 8761 ± 7% +17.6% 10299 ± 9% cpuidle.POLL.usage > > 30454483 ± 0% +66.4% 50666102 ± 1% meminfo.Cached > > > > Do you see what's happening? > > not really. All I could see in the previous data was that the memory > locality was different (and better) with my patch, which I cannot > explain either because get_scan_count is always per-node thing. Moreover > the change shouldn't make any difference for normal GFP_KERNEL requests > on 64b systems because the reclaim index covers all zones so there is > nothing to skip over. > > > Or is there anything we can do to improve fsmark benchmark setup to > > make it more reasonable? > > Unfortunatelly I am not an expert on this benchmark. Maybe Mel knows > better.
There is not much to be an expert on with that benchmark. It creates a bunch of files of the requested size for a number of iterations. In async configurations, it can be heavily skewed by the first few iterations until dirty limits are hit. Once that point is reached, the files/sec drops rapidly to some value below the writing speed of the underlying device. Hence, looking at the average performance of it is risky and very sensitive to exact timing unless this is properly accounted for. In async configurations, stalls are dominated by balance_dirty_pages and some filesystem details such as whether it needs to wait for space in a transaction log. That also limits the overall performance of the workload. Once the stable phase is reached, there still will be quite some variability due to the timing of the writeback threads that cause a bit of jitter as well as the usual concerns with multiple threads writing to different parts of the disk. When NUMA is taken into account, it is important to consider the size of the NUMA nodes as assymetric sizes will affect when remote memory is used and to a lesser extent when balance_dirty_pages is triggered. The benchmark is what it is. You can force it to generate stable figures but it won't have the same behaviour so it all depends on how you define "reasonable". At the very minimum, take into account that an average of multiple iterations will be skewed early in the workloads lifetime by the fact it hasn't hit dirty limits yet. -- Mel Gorman SUSE Labs