I would like to caution people about this conventional
wisdom. For UFS the I/O size tunable like maxphys ended up
as an important parameter to the write throttle algorithm.
The bigger it was the less coupled the application would be
to the storage and some important gains could come from
that.

Same with raw disk;  here dd issues a single  I/O at a time;
Then it goes into a read (a copy) and a write (another copy)
and then issues a new  I/O and waits for  it. This kind stop
and  go  behavior  requires  a very large   chunk to  get to
platter speed. ZFS does not behave like this.

ZFS carves up  the data in 128K but  will issue a  number of
I/O concurrently through  it's I/O scheduler.  Currently the
maximum number of concurrent I/O  is 35 per leaf device. The
expectation   is that 35  concurrent   I/O  per spindles  is
sufficient to reach max throughput rate.  A quick test on my
system,  FWIW, shows  that   this expectation  is met.

Robert's data  thus   point to a  shortcoming    that can be
explained either by

        128K is too small a block size (which I doubt very
        much, given enough concurrency).
        
        ZFS synch code is sometimes not agressive enough
        and will hit a lull in how it drives the I/O;

        The storage subsystem requires more than the 35
        concurrent 128K I/O to reach it's top speed.


we just need to figure which one(s) is(are) at play, to know what to fix.

-r

Richard McDougall writes:
 > 
 > Setting larger maxphys has in my experience made quite a difference,
 > depending on the target. (actually on x64 it makes it
 > slower). Generally, performance increases as the transfer size
 > increases has been my observation, for some classes of targets. This
 > is likely due to the target's ability to seek and process a single
 > large request (for example, some targets won't cluster 8x1MB requests
 > into a single operation).
 > 
 > Henry Newman of Instrumental reported some time back that he has
 > observed gains all the way up to 32MB for sequntial I/O on high end
 > HPC oriented arrays.
 > 
 > Regards,
 > 
 > Richard.
 >  
 > 
 > On Sat, Apr 29, 2006 at 06:12:07AM -0700, Robert Milkowski wrote:
 > > "But even
 > > if you set it to 8 Mbytes, the protocol overhead per transfer is
 > > so small that the performance gains will be barely noticeable."
 > > 
 > > Well, i can't agree. v440 with 3510 JBOD with 15K 73GB disks, connected 
 > > with two links (MPxIO).
 > > 
 > > bash-3.00# dd if=/dev/zero of=/dev/rdsk/c5t500000E0119495A0d0s0  bs=1024k
 > >                     extended device statistics
 > >     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 > >     0.0   63.8    0.0 65381.4  0.0  1.0    0.0   15.0   0  96 ssd193
 > >                     extended device statistics
 > >     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 > >     0.0   66.2    0.0 67753.5  0.0  1.0    0.0   14.5   0  96 ssd193
 > >                     extended device statistics
 > >     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 > >     0.0   63.0    0.0 64521.3  0.0  1.0    0.0   15.2   0  96 ssd193
 > > 
 > > 
 > > bash-3.00# dd if=/dev/zero of=/dev/rdsk/c5t500000E0119495A0d0s0  bs=8192k
 > > 
 > >     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 > >     0.0    9.8    0.0 80642.8  0.0  0.9    0.0   93.7   0  92 c5
 > >     0.0    9.8    0.0 80641.9  0.0  0.9    0.0   93.7   0  92 
 > > c5t500000E0119495A0d0
 > >                     extended device statistics
 > >     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 > >     0.0   10.2    0.0 83267.8  0.0  0.9    0.0   91.1   0  93 c5
 > >     0.0   10.2    0.0 83268.1  0.0  0.9    0.0   91.1   0  93 
 > > c5t500000E0119495A0d0
 > >                     extended device statistics
 > >     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 > >     0.0    9.8    0.0 80552.4  0.0  0.9    0.0   93.8   0  92 c5
 > >     0.0    9.8    0.0 80551.6  0.0  0.9    0.0   93.8   0  92 
 > > c5t500000E0119495A0d0
 > > 
 > > So 1024KB is about 25% slower in this case.
 > >  
 > >  
 > > This message posted from opensolaris.org
 > > _______________________________________________
 > > perf-discuss mailing list
 > > perf-discuss@opensolaris.org
 > 
 > -- 
 > 
 > 
 > :-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:
 >   Richard Mc Dougall                 : [EMAIL PROTECTED]
 >   Performance and Availability       : x31542  
 >   Engineering                        : http://devnull.eng
 >   Sun Microsystems Inc               : +1 650 352 6438
 > :-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:
 > _______________________________________________
 > perf-discuss mailing list
 > perf-discuss@opensolaris.org

_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

Reply via email to