On 09/16/09 14:19, Richard Elling wrote:
On Sep 16, 2009, at 1:09 PM, Bob Friesenhahn wrote:

On Wed, 16 Sep 2009, Thomas Burgess wrote:

hrm, i always thought raidz took longer....learn something every day =)

And you were probably right, in spite of Richard's lack of knowledge of a study or the feeling in his gut. Just look at the many postings here about resilvering and you will see far more complaints about raidz taking a long time.

Actually, I had a ton of data on resilvering which shows mirrors and
raidz equivalently bottlenecked on the media write bandwidth. However,
there are other cases which are IOPS bound (or CR bound :-) which
cover some of the postings here. I think Sommerfeld has some other
data which could be pertinent.

This primarily has to do with the stripe width and block size. The difference between mirroring and RAID-Z is that with RAID-Z each ZFS block is again chunked up into smaller blocks and distributed across the stripe. So if you have a wide stripe (i.e. 32), a 128k block can be chunked up into 4k blocks, while a small recordsize can be chunked even smaller (i.e. 8k to 1k or 512).

ZFS resilvering is metadata based to allow for efficient resilvering of outages, but when a relatively full disk needs to be replaced you end up bottlenecked on the metadata traversal. If your blocks are chunked up small enough, this becomes a random I/O benchmark for the good disks in the RAID stripe. If your pool is backed by 7200 RPM disks, this can end up taking a very long time.

The ZFS team is actively working on improvements in this area.

- Eric

--
Eric Schrock, Fishworks                    http://blogs.sun.com/eschrock
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to