On 09/16/09 14:19, Richard Elling wrote:
On Sep 16, 2009, at 1:09 PM, Bob Friesenhahn wrote:
On Wed, 16 Sep 2009, Thomas Burgess wrote:
hrm, i always thought raidz took longer....learn something every day =)
And you were probably right, in spite of Richard's lack of knowledge
of a study or the feeling in his gut. Just look at the many postings
here about resilvering and you will see far more complaints about
raidz taking a long time.
Actually, I had a ton of data on resilvering which shows mirrors and
raidz equivalently bottlenecked on the media write bandwidth. However,
there are other cases which are IOPS bound (or CR bound :-) which
cover some of the postings here. I think Sommerfeld has some other
data which could be pertinent.
This primarily has to do with the stripe width and block size. The
difference between mirroring and RAID-Z is that with RAID-Z each ZFS
block is again chunked up into smaller blocks and distributed across the
stripe. So if you have a wide stripe (i.e. 32), a 128k block can be
chunked up into 4k blocks, while a small recordsize can be chunked even
smaller (i.e. 8k to 1k or 512).
ZFS resilvering is metadata based to allow for efficient resilvering of
outages, but when a relatively full disk needs to be replaced you end up
bottlenecked on the metadata traversal. If your blocks are chunked up
small enough, this becomes a random I/O benchmark for the good disks in
the RAID stripe. If your pool is backed by 7200 RPM disks, this can end
up taking a very long time.
The ZFS team is actively working on improvements in this area.
- Eric
--
Eric Schrock, Fishworks http://blogs.sun.com/eschrock
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss