One issue is what we mean by "saturation."  It's easy to bring a disk to 100% 
busy.  We need to keep this discussion in the context of a workload.  Generally 
when people care about streaming throghput of a disk, it's because they are 
reading or writing a single large file, and they want to reach as closely as 
possible the full media rate.

Once the write cache on a device is enabled, it's quite easy to maximize write 
performance. All you need is to move data quickly enough into the device's 
buffer so that it's never found to be empty while the head is writing; and, of 
course, avoid ever moving the disk head away. Since the media rate is typically 
fairly low (e.g. 20-80 MB/sec), this isn't that hard on either FibreChannel or 
SCSI, and shouldn't be too difficult for ATA either.  Very small requests are 
hurt by protocol and stack overhead, but moderately large requests (1-2 MB) can 
usually reach the full rate, at least for a single disk.  (Disk arrays often 
have faster back ends than the interconnect, so are always limited by protocol 
and stack overhead, even for large transfers.)

With a disabled write cache, there will always be some protocol and stack 
overhead; and with less sophisticated disks, you'll miss on average half a 
revolution of the disk each time you write (as you wait for the first sector to 
go beneath the head). More sophisticated disks will reorder data during the 
write, and the most sophisticated (with FC/SCSI interfaces) can actually get 
the data from the host out-of-order to match the sectors passing underneath the 
head. In this case the only way to come close to disk rates with smaller writes 
is to issue overlapping commands, with tags allowing the device to reorder 
them, and hope that the device has enough buffering to reorder all writes into 
sequence.

Disk seeks remain the worst enemy of streaming performance, however.  There's 
no way to avoid that.  ZFS should be able to achieve high write performance as 
long as it can allocate blocks (including metadata) in a forward direction and 
minimize the number of times the überblock must be written. Reads will be more 
challenging unless the data was written contiguously.  The biggest issue with 
128K block size for ZFS, I suspect, will be the seek between each read.  Even a 
fast (1 ms) seek represents 60KB of lost data transfer on a disk which can 
transfer data at 60 MBps.
 
 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to