I would like to caution people about this conventional wisdom. For UFS the I/O size tunable like maxphys ended up as an important parameter to the write throttle algorithm. The bigger it was the less coupled the application would be to the storage and some important gains could come from that.
Same with raw disk; here dd issues a single I/O at a time; Then it goes into a read (a copy) and a write (another copy) and then issues a new I/O and waits for it. This kind stop and go behavior requires a very large chunk to get to platter speed. ZFS does not behave like this. ZFS carves up the data in 128K but will issue a number of I/O concurrently through it's I/O scheduler. Currently the maximum number of concurrent I/O is 35 per leaf device. The expectation is that 35 concurrent I/O per spindles is sufficient to reach max throughput rate. A quick test on my system, FWIW, shows that this expectation is met. Robert's data thus point to a shortcoming that can be explained either by 128K is too small a block size (which I doubt very much, given enough concurrency). ZFS synch code is sometimes not agressive enough and will hit a lull in how it drives the I/O; The storage subsystem requires more than the 35 concurrent 128K I/O to reach it's top speed. we just need to figure which one(s) is(are) at play, to know what to fix. -r Richard McDougall writes: > > Setting larger maxphys has in my experience made quite a difference, > depending on the target. (actually on x64 it makes it > slower). Generally, performance increases as the transfer size > increases has been my observation, for some classes of targets. This > is likely due to the target's ability to seek and process a single > large request (for example, some targets won't cluster 8x1MB requests > into a single operation). > > Henry Newman of Instrumental reported some time back that he has > observed gains all the way up to 32MB for sequntial I/O on high end > HPC oriented arrays. > > Regards, > > Richard. > > > On Sat, Apr 29, 2006 at 06:12:07AM -0700, Robert Milkowski wrote: > > "But even > > if you set it to 8 Mbytes, the protocol overhead per transfer is > > so small that the performance gains will be barely noticeable." > > > > Well, i can't agree. v440 with 3510 JBOD with 15K 73GB disks, connected > > with two links (MPxIO). > > > > bash-3.00# dd if=/dev/zero of=/dev/rdsk/c5t500000E0119495A0d0s0 bs=1024k > > extended device statistics > > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > > 0.0 63.8 0.0 65381.4 0.0 1.0 0.0 15.0 0 96 ssd193 > > extended device statistics > > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > > 0.0 66.2 0.0 67753.5 0.0 1.0 0.0 14.5 0 96 ssd193 > > extended device statistics > > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > > 0.0 63.0 0.0 64521.3 0.0 1.0 0.0 15.2 0 96 ssd193 > > > > > > bash-3.00# dd if=/dev/zero of=/dev/rdsk/c5t500000E0119495A0d0s0 bs=8192k > > > > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > > 0.0 9.8 0.0 80642.8 0.0 0.9 0.0 93.7 0 92 c5 > > 0.0 9.8 0.0 80641.9 0.0 0.9 0.0 93.7 0 92 > > c5t500000E0119495A0d0 > > extended device statistics > > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > > 0.0 10.2 0.0 83267.8 0.0 0.9 0.0 91.1 0 93 c5 > > 0.0 10.2 0.0 83268.1 0.0 0.9 0.0 91.1 0 93 > > c5t500000E0119495A0d0 > > extended device statistics > > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > > 0.0 9.8 0.0 80552.4 0.0 0.9 0.0 93.8 0 92 c5 > > 0.0 9.8 0.0 80551.6 0.0 0.9 0.0 93.8 0 92 > > c5t500000E0119495A0d0 > > > > So 1024KB is about 25% slower in this case. > > > > > > This message posted from opensolaris.org > > _______________________________________________ > > perf-discuss mailing list > > perf-discuss@opensolaris.org > > -- > > > :-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-: > Richard Mc Dougall : [EMAIL PROTECTED] > Performance and Availability : x31542 > Engineering : http://devnull.eng > Sun Microsystems Inc : +1 650 352 6438 > :-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-: > _______________________________________________ > perf-discuss mailing list > perf-discuss@opensolaris.org _______________________________________________ perf-discuss mailing list perf-discuss@opensolaris.org