Hi, Christoph *From: Zhao Lei [mailto:zhao...@cn.fujitsu.com] > Sent: Monday, March 09, 2015 10:47 AM > To: 'Christoph Hellwig'; 'Jan Kara' > Cc: 'Tejun Heo'; 'Jens Axboe' > Subject: RE: Regression caused by using node_to_bdi() > > Hi, Christoph and Jan > > * From: 'Christoph Hellwig' [mailto:h...@lst.de] > > Sent: Sunday, March 08, 2015 11:34 PM > > To: Jan Kara > > Cc: Zhao Lei; 'Christoph Hellwig'; 'Tejun Heo'; 'Jens Axboe' > > Subject: Re: Regression caused by using node_to_bdi() > > > > On Sun, Mar 08, 2015 at 11:29:16AM +0100, Jan Kara wrote: > > > Frankly, I doubt the cost of inode_to_bdi() is the reason for the > > > slowdown here. If I read the numbers right, the throughput dropped > > > from 135 MB/s on average to 130 MB/s on average. Such load is hardly > > > going to saturate the CPU enough for additional cycles in > > > inode_to_bdi() to > > matter. > > > The load like this is completely IO bound unless you have really > > > fast drive (doing GB/s). What are the throughput number just before > > > / after this commit?\ > > These are performance data before and after this patch In bisect:
What is your opinion about this regression? Please tell me if you need additional test and result on my env. Thanks Zhaolei > > v3.19-rc5_00005_495a27 : io_speed: valcnt=10 avg=137.409 > range=[134.820,139.000] diff=3.10% stdev=1.574 cv=1.15% > v3.19-rc5_00006_26ff13 : io_speed: valcnt=10 avg=136.534 > range=[132.390,139.500] diff=5.37% stdev=2.659 cv=1.95% > v3.19-rc5_00007_de1414 : io_speed: valcnt=10 avg=130.358 > range=[129.070,132.150] diff=2.39% stdev=1.120 cv=0.86% <- *this patch* > v3.19-rc5_00008_b83ae6 : io_speed: valcnt=10 avg=129.549 > range=[129.200,129.910] diff=0.55% stdev=0.241 cv=0.19% > v3.19-rc5_00011_c4db59 : io_speed: valcnt=10 avg=130.033 > range=[129.050,131.620] diff=1.99% stdev=0.854 cv=0.66% > > > > What is the CPU load while the benchmark is running? > > > I hadn't record cpu load in testing, I'll do it if it is necessary for debug. > > These are of one of sysbench's log: > > sysbench 0.4.12: multi-threaded system evaluation benchmark > > 1 files, 4194304Kb each, 4096Mb total > Creating files for the test... > sysbench 0.4.12: multi-threaded system evaluation benchmark > > Running the test with following options: > Number of threads: 1 > > Extra file open flags: 0 > 1 files, 4Gb each > 4Gb total file size > Block size 32Kb > Using synchronous I/O mode > Doing sequential write (creation) test > Threads started! > Done. > > Operations performed: 0 Read, 131072 Write, 0 Other = 131072 Total Read > 0b Written 4Gb Total transferred 4Gb (132.15Mb/sec) > 4228.75 Requests/sec executed > > Test execution summary: > total time: 30.9955s > total number of events: 131072 > total time taken by event execution: 30.8731 > per-request statistics: > min: 0.01ms > avg: 0.24ms > max: 30.80ms > approx. 95 percentile: 0.03ms > > Threads fairness: > events (avg/stddev): 131072.0000/0.00 > execution time (avg/stddev): 30.8731/0.00 > > sysbench 0.4.12: multi-threaded system evaluation benchmark > > > > How much memory does the machine have? > > > 2G mem, 2-core machin, test is running on 1T sata disk. > > [root@btrfs test_nosync_32768__sync_1_seqwr_4G_btrfs_1]# cat > /proc/meminfo > MemTotal: 2015812 kB > MemFree: 627416 kB > MemAvailable: 1755488 kB > Buffers: 345876 kB > Cached: 772788 kB > SwapCached: 0 kB > Active: 848864 kB > Inactive: 320044 kB > Active(anon): 54128 kB > Inactive(anon): 5080 kB > Active(file): 794736 kB > Inactive(file): 314964 kB > Unevictable: 0 kB > Mlocked: 0 kB > SwapTotal: 0 kB > SwapFree: 0 kB > Dirty: 0 kB > Writeback: 0 kB > AnonPages: 50140 kB > Mapped: 41636 kB > Shmem: 8984 kB > Slab: 200312 kB > SReclaimable: 187308 kB > SUnreclaim: 13004 kB > KernelStack: 1728 kB > PageTables: 4056 kB > NFS_Unstable: 0 kB > Bounce: 0 kB > WritebackTmp: 0 kB > CommitLimit: 1007904 kB > Committed_AS: 205956 kB > VmallocTotal: 34359738367 kB > VmallocUsed: 539968 kB > VmallocChunk: 34359195223 kB > HardwareCorrupted: 0 kB > AnonHugePages: 6144 kB > HugePages_Total: 0 > HugePages_Free: 0 > HugePages_Rsvd: 0 > HugePages_Surp: 0 > Hugepagesize: 2048 kB > DirectMap4k: 61056 kB > DirectMap2M: 2000896 kB > [root@btrfs test_nosync_32768__sync_1_seqwr_4G_btrfs_1]# cat > /proc/cpuinfo > processor : 0 > vendor_id : GenuineIntel > cpu family : 6 > model : 23 > model name : Intel(R) Core(TM)2 Duo CPU E7500 @ 2.93GHz > stepping : 10 > microcode : 0xa0b > cpu MHz : 1603.000 > cache size : 3072 KB > physical id : 0 > siblings : 2 > core id : 0 > cpu cores : 2 > apicid : 0 > initial apicid : 0 > fpu : yes > fpu_exception : yes > cpuid level : 13 > wp : yes > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca > cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm > constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 > monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm dtherm > tpr_shadow vnmi flexpriority > bugs : > bogomips : 5851.89 > clflush size : 64 > cache_alignment : 64 > address sizes : 36 bits physical, 48 bits virtual > power management: > > processor : 1 > vendor_id : GenuineIntel > cpu family : 6 > model : 23 > model name : Intel(R) Core(TM)2 Duo CPU E7500 @ 2.93GHz > stepping : 10 > microcode : 0xa0b > cpu MHz : 1603.000 > cache size : 3072 KB > physical id : 0 > siblings : 2 > core id : 1 > cpu cores : 2 > apicid : 1 > initial apicid : 1 > fpu : yes > fpu_exception : yes > cpuid level : 13 > wp : yes > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca > cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm > constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 > monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm dtherm > tpr_shadow vnmi flexpriority > bugs : > bogomips : 5851.89 > clflush size : 64 > cache_alignment : 64 > address sizes : 36 bits physical, 48 bits virtual > power management: > > [root@btrfs test_nosync_32768__sync_1_seqwr_4G_btrfs_1]# > > Please tell me if you are interesting on more information or operation. > > Thanks > Zhaolei > > > I remember an issue a few years ago where simply reverting a path that > > uninlined the rw_sem code fixed a buffered I/O performance regression > > when using Samba on a very low end arm device, so everything is possible. > > > > I'd still like to ensure the numbers are reproducible in this case > > first, and look at all the information Jan asked for. Ask a next step > > we could then look at using an inline version to check if thast helps. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/