On Sun, 2015-02-15 at 13:02 -0500, Trond Myklebust wrote: > Hi guys, > > On Sun, Feb 15, 2015 at 2:57 AM, Huang Ying <ying.hu...@intel.com> wrote: > > FYI, we noticed the below changes on > > > > commit c4a7ca774949960064dac11b326908f28407e8c3 ("SUNRPC: Allow waiting on > > memory allocation") > > > > > > testbox/testcase/testparams: > > nhm4/fsmark/performance-1x-32t-1HDD-f2fs-nfsv4-8K-400M-fsyncBeforeClose-16d-256fpd > > > > 127b21b89f9d8ba0 c4a7ca774949960064dac11b32 > > ---------------- -------------------------- > > %stddev %change %stddev > > \ | \ > > 52524 ± 0% +6.0% 55672 ± 0% > > fsmark.time.involuntary_context_switches > > 436 ± 14% +54.9% 676 ± 20% > > sched_debug.cfs_rq[0]:/.tg_load_contrib > > 433 ± 15% +54.7% 670 ± 21% > > sched_debug.cfs_rq[0]:/.blocked_load_avg > > 8348 ± 7% +27.0% 10602 ± 9% > > sched_debug.cfs_rq[0]:/.min_vruntime > > 190081 ± 13% +32.7% 252269 ± 13% sched_debug.cpu#0.sched_goidle > > 205783 ± 12% +30.2% 267903 ± 13% sched_debug.cpu#0.ttwu_local > > 464065 ± 11% +26.6% 587524 ± 12% sched_debug.cpu#0.nr_switches > > 464278 ± 11% +26.6% 587734 ± 12% sched_debug.cpu#0.sched_count > > 15807 ± 11% +19.6% 18910 ± 12% > > sched_debug.cpu#4.nr_load_updates > > 300041 ± 8% +20.3% 360969 ± 10% sched_debug.cpu#0.ttwu_count > > 1863 ± 9% +18.1% 2201 ± 10% > > sched_debug.cfs_rq[4]:/.exec_clock > > > > testbox/testcase/testparams: > > nhm4/fsmark/performance-1x-32t-1HDD-btrfs-nfsv4-8K-400M-fsyncBeforeClose-16d-256fpd > > > > 127b21b89f9d8ba0 c4a7ca774949960064dac11b32 > > ---------------- -------------------------- > > fail:runs %reproduction fail:runs > > | | | > > 52184 ± 0% +5.6% 55122 ± 0% > > fsmark.time.involuntary_context_switches > > 557 ± 19% +21.5% 677 ± 9% > > sched_debug.cfs_rq[5]:/.blocked_load_avg > > 217 ± 19% -42.9% 124 ± 21% sched_debug.cfs_rq[2]:/.load > > 45852 ± 14% -39.4% 27773 ± 24% sched_debug.cpu#7.ttwu_local > > 457 ± 18% +50.1% 686 ± 20% > > sched_debug.cfs_rq[0]:/.tg_load_contrib > > 455 ± 18% +46.7% 668 ± 19% > > sched_debug.cfs_rq[0]:/.blocked_load_avg > > 66605 ± 10% -26.7% 48826 ± 14% sched_debug.cpu#7.sched_goidle > > 78249 ± 9% -22.5% 60678 ± 11% sched_debug.cpu#7.ttwu_count > > 153506 ± 9% -22.7% 118649 ± 12% sched_debug.cpu#7.nr_switches > > 153613 ± 9% -22.7% 118755 ± 12% sched_debug.cpu#7.sched_count > > 15806 ± 6% +19.2% 18833 ± 18% > > sched_debug.cpu#4.nr_load_updates > > 2171 ± 5% +15.6% 2510 ± 13% > > sched_debug.cfs_rq[4]:/.exec_clock > > 9924 ± 11% -27.0% 7244 ± 25% > > sched_debug.cfs_rq[3]:/.min_vruntime > > 3156 ± 4% -13.4% 2734 ± 8% > > sched_debug.cfs_rq[7]:/.min_vruntime > > > > testbox/testcase/testparams: > > nhm4/fsmark/performance-1x-32t-1HDD-ext4-nfsv4-9B-400M-fsyncBeforeClose-16d-256fpd > > > > 127b21b89f9d8ba0 c4a7ca774949960064dac11b32 > > ---------------- -------------------------- > > 104802 ± 0% +7.7% 112883 ± 0% > > fsmark.time.involuntary_context_switches > > 471755 ± 0% -1.3% 465592 ± 0% > > fsmark.time.voluntary_context_switches > > 1977 ± 36% +90.8% 3771 ± 8% sched_debug.cpu#4.curr->pid > > 2 ± 34% +80.0% 4 ± 24% sched_debug.cpu#6.cpu_load[1] > > 4 ± 33% +83.3% 8 ± 31% sched_debug.cpu#6.cpu_load[0] > > 193 ± 17% +48.0% 286 ± 19% > > sched_debug.cfs_rq[2]:/.blocked_load_avg > > 196 ± 17% +47.5% 290 ± 19% > > sched_debug.cfs_rq[2]:/.tg_load_contrib > > 96 ± 18% +40.6% 135 ± 11% sched_debug.cfs_rq[7]:/.load > > 97 ± 18% +38.5% 135 ± 11% sched_debug.cpu#7.load > > 2274 ± 7% -16.5% 1898 ± 3% proc-vmstat.pgalloc_dma > > 319 ± 6% -29.7% 224 ± 24% > > sched_debug.cfs_rq[1]:/.tg_load_contrib > > 314 ± 5% -29.4% 222 ± 25% > > sched_debug.cfs_rq[1]:/.blocked_load_avg > > 621 ± 10% +41.9% 881 ± 37% > > sched_debug.cfs_rq[4]:/.avg->runnable_avg_sum > > > > nhm4: Nehalem > > Memory: 4G > > > > > > > > > > fsmark.time.involuntary_context_switches > > > > 114000 > > ++-----------------------------------------------------------------+ > > 113000 O+ O O O O O O O O O O O O O O O O O > > | > > | O O O O > > O > > 112000 ++ > > | > > 111000 ++ > > | > > | > > | > > 110000 ++ > > | > > 109000 ++ > > | > > 108000 ++ > > | > > | > > | > > 107000 ++ > > | > > 106000 ++ > > | > > | > > | > > 105000 *+.*..*..*..*..*..*..*..*..*..*...*..*..*..*..*..*..*..* > > | > > 104000 > > ++-----------------------------------------------------------------+ > > > > > > [*] bisect-good sample > > [O] bisect-bad sample > > > > To reproduce: > > > > apt-get install ruby > > git clone > > git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git > > cd lkp-tests > > bin/setup-local job.yaml # the job file attached in this email > > bin/run-local job.yaml > > So this is on a loopback NFS setup (i.e. the server resides on the > same node as the client, which just mounts from the loopback IP > address 127.0.0.1)?
Yes. This is on a loopback NFS setup. > That's a fairly quirky setup as far as memory management goes. In low > memory situations, you have very a nasty feedback mechanism whereby > the NFS server ends up pushing the client to write back more data, > increasing the memory pressure on the NFS server, etc. > It is quite possible that allowing the NFS client to block more > aggressively in low memory situations could worsen that feedback > mechanism, however that's not our main target platform; we actively > discourage people from using loopback NFS in production systems. > > Is there any way you could confirm this performance change using a > remote NFS server instead of the loopback NFS? We are working on a remote NFS setup now :) Best Regards, Huang, Ying -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/