[assuming we're talking about disks and not "hardware RAID arrays"...]
On Tue, 2006-05-30 at 11:43 -0500, Anton Rang wrote: > > Sure, the block size may be 128KB, but ZFS can bundle more than one > > per-file/transaction > > But it doesn't right now, as far as I can tell. The protocol overhead is still orders of magnitude faster than a rev. Sure, there are pathological cases such as FC-AL over 200kms with 100+ nodes, but most folks won't hurt themselves like that. For modern disks, multiple 128kByte transfers will spend a long time in the disk's buffer cache waiting to be written to media. > I never see ZFS issuing > a 16 MB write, for instance. You simply can't get the same performance > from a disk array issuing 128 KB writes that you can with 16 MB writes. > It's physically impossible because of protocol overhead, even if the > controller itself were infinitely fast. (There's also the issue that at > 128 KB, most disk arrays will choose to cache rather than stream the > data, since it's less than a single RAID stripe, which slows you down.) Very few disks have 16MByte write buffer caches, so if you want to send such a large iop down the wire (DAS please, otherwise you kill the SAN), then you'll be waiting on the media anyway. The disk interconnect is faster than the media speed. I don't see how you could avoid blowing a rev in that case. Surely there is a more generally applicable blocksize which is appropriate. Since many disks today do support queued commands, I don't see the 128kByte iop as a large, inherent limitation. OTOH, the jury is still out... -- richard _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss