Re: [zfs-discuss] Re: Re[5]: Re: Re: Due to 128KB limit in ZFS it can'tsaturate disks

Roch Bourbonnais - Performance Engineering Tue, 16 May 2006 07:35:31 -0700


Anton B. Rang writes:
 > One issue is what we mean by "saturation."  It's easy to
bring a disk to 100% busy.  We need to keep this discussion
in the context of a workload.  Generally when people care
about streaming throghput of a disk, it's because they are
reading or writing a single large file, and they want to
reach as closely as possible the full media rate. 
 > 
 > Once the write cache on a device is enabled, it's quite
easy to maximize write performance. All you need is to move
data quickly enough into the device's buffer so that it's
never found to be empty while the head is writing; and, of
course, avoid ever moving the disk head away. Since the
media rate is typically fairly low (e.g. 20-80 MB/sec), this
isn't that hard on either FibreChannel or SCSI, and
shouldn't be too difficult for ATA either.  Very small
requests are hurt by protocol and stack overhead, but
moderately large requests (1-2 MB) can usually reach the
full rate, at least for a single disk.  (Disk arrays often
have faster back ends than the interconnect, so are always
limited by protocol and stack overhead, even for large
transfers.) 
 > 
 > With a disabled write cache, there will always be some
protocol and stack overhead; and with less sophisticated
disks, you'll miss on average half a revolution of the disk
each time you write (as you wait for the first sector to go
beneath the head). More sophisticated disks will reorder
data during the write, and the most sophisticated (with
FC/SCSI interfaces) can actually get the data from the host
out-of-order to match the sectors passing underneath the
head. In this case the only way to come close to disk rates
with smaller writes is to issue overlapping commands, with
tags allowing the device to reorder them, and hope that the
device has enough buffering to reorder all writes into
sequence. 
 > 
 > Disk seeks remain the worst enemy of streaming
performance, however.  There's no way to avoid that.  ZFS
should be able to achieve high write performance as long as
it can allocate blocks (including metadata) in a forward
direction and minimize the number of times the Ã¼berblock
must be written. Reads will be more challenging unless the
data was written contiguously.  The biggest issue with 128K
block size for ZFS, I suspect, will be the seek between each
read.  Even a fast (1 ms) seek represents 60KB of lost data
transfer on a disk which can transfer data at 60 MBps. 
 >  
 >  
 > This message posted from opensolaris.org
 > _______________________________________________
 > zfs-discuss mailing list
 > zfs-discuss@opensolaris.org
 > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



Ok so lets consider your 2MB read. You have the option of
setting in in one contiguous place on the disk or split it
into 16 x 128K chunks, somewhat spread all over.

Now you issue a read to that 2MB of data.

As you noted, you  either have to wait for  the head to find
the 2MB block and  stream it, or  you dump 16 I/O descriptor
into an intelligent  controller; Wherever the  head is there
is data to be gotten from the get go. I can't swear it wins
the game, but it should be real close.

I just did an experiment and could see > 60MB of data out of 
a 35G disk using 128K chunks (> 450 IOPS).

Disruptive.

-r

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: Re[5]: Re: Re: Due to 128KB limit in ZFS it can'tsaturate disks

Reply via email to