Anton B. Rang writes: > One issue is what we mean by "saturation." It's easy to bring a disk to 100% busy. We need to keep this discussion in the context of a workload. Generally when people care about streaming throghput of a disk, it's because they are reading or writing a single large file, and they want to reach as closely as possible the full media rate. > > Once the write cache on a device is enabled, it's quite easy to maximize write performance. All you need is to move data quickly enough into the device's buffer so that it's never found to be empty while the head is writing; and, of course, avoid ever moving the disk head away. Since the media rate is typically fairly low (e.g. 20-80 MB/sec), this isn't that hard on either FibreChannel or SCSI, and shouldn't be too difficult for ATA either. Very small requests are hurt by protocol and stack overhead, but moderately large requests (1-2 MB) can usually reach the full rate, at least for a single disk. (Disk arrays often have faster back ends than the interconnect, so are always limited by protocol and stack overhead, even for large transfers.) > > With a disabled write cache, there will always be some protocol and stack overhead; and with less sophisticated disks, you'll miss on average half a revolution of the disk each time you write (as you wait for the first sector to go beneath the head). More sophisticated disks will reorder data during the write, and the most sophisticated (with FC/SCSI interfaces) can actually get the data from the host out-of-order to match the sectors passing underneath the head. In this case the only way to come close to disk rates with smaller writes is to issue overlapping commands, with tags allowing the device to reorder them, and hope that the device has enough buffering to reorder all writes into sequence. > > Disk seeks remain the worst enemy of streaming performance, however. There's no way to avoid that. ZFS should be able to achieve high write performance as long as it can allocate blocks (including metadata) in a forward direction and minimize the number of times the überblock must be written. Reads will be more challenging unless the data was written contiguously. The biggest issue with 128K block size for ZFS, I suspect, will be the seek between each read. Even a fast (1 ms) seek represents 60KB of lost data transfer on a disk which can transfer data at 60 MBps. > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Ok so lets consider your 2MB read. You have the option of setting in in one contiguous place on the disk or split it into 16 x 128K chunks, somewhat spread all over. Now you issue a read to that 2MB of data. As you noted, you either have to wait for the head to find the 2MB block and stream it, or you dump 16 I/O descriptor into an intelligent controller; Wherever the head is there is data to be gotten from the get go. I can't swear it wins the game, but it should be real close. I just did an experiment and could see > 60MB of data out of a 35G disk using 128K chunks (> 450 IOPS). Disruptive. -r _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss