One issue is what we mean by "saturation." It's easy to bring a disk to 100% busy. We need to keep this discussion in the context of a workload. Generally when people care about streaming throghput of a disk, it's because they are reading or writing a single large file, and they want to reach as closely as possible the full media rate.
Once the write cache on a device is enabled, it's quite easy to maximize write performance. All you need is to move data quickly enough into the device's buffer so that it's never found to be empty while the head is writing; and, of course, avoid ever moving the disk head away. Since the media rate is typically fairly low (e.g. 20-80 MB/sec), this isn't that hard on either FibreChannel or SCSI, and shouldn't be too difficult for ATA either. Very small requests are hurt by protocol and stack overhead, but moderately large requests (1-2 MB) can usually reach the full rate, at least for a single disk. (Disk arrays often have faster back ends than the interconnect, so are always limited by protocol and stack overhead, even for large transfers.) With a disabled write cache, there will always be some protocol and stack overhead; and with less sophisticated disks, you'll miss on average half a revolution of the disk each time you write (as you wait for the first sector to go beneath the head). More sophisticated disks will reorder data during the write, and the most sophisticated (with FC/SCSI interfaces) can actually get the data from the host out-of-order to match the sectors passing underneath the head. In this case the only way to come close to disk rates with smaller writes is to issue overlapping commands, with tags allowing the device to reorder them, and hope that the device has enough buffering to reorder all writes into sequence. Disk seeks remain the worst enemy of streaming performance, however. There's no way to avoid that. ZFS should be able to achieve high write performance as long as it can allocate blocks (including metadata) in a forward direction and minimize the number of times the überblock must be written. Reads will be more challenging unless the data was written contiguously. The biggest issue with 128K block size for ZFS, I suspect, will be the seek between each read. Even a fast (1 ms) seek represents 60KB of lost data transfer on a disk which can transfer data at 60 MBps. This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss