On 23-Jun-09, at 1:58 PM, Erik Trimble wrote:
Richard Elling wrote:
Erik Trimble wrote:
All this discussion hasn't answered one thing for me: exactly
_how_ does ZFS do resilvering? Both in the case of mirrors, and
of RAIDZ[2] ?
I've seen some mention that it goes in cronological order (which
to me, means that the metadata must be read first) of file
creation, and that only used blocks are rebuilt, but exactly what
is the methodology being used?
See Jeff Bonwick's blog on the topic
http://blogs.sun.com/bonwick/entry/smokin_mirrors
-- richard
That's very informative. Thanks, Richard.
So, ZFS walks the used block tree to see what still needs
rebuilding. I guess I have two related questions then:
(1) Are these blocks some fixed size (based on the media - usually
512 bytes), or are they "ZFS blocks" - the fungible size based on
the requirements of the original file size being written?
(2) is there some reasonable way to read in multiples of these
blocks in a single IOP? Theoretically, if the blocks are in
chronological creation order, they should be (relatively)
sequential on the drive(s). Thus, ZFS should be able to read in
several of them without forcing a random seek.
(I think) the disk's internal scheduling could help out here if they
are indeed close to physically sequential.
--Toby
That is, you should be able to get multiple blocks in a single IOP.
If we can't get multiple ZFS blocks in one sequential read, we're
screwed - ZFS is going to be IOPS bound on the replacement disk,
with no real workaround. Which means rebuild times for disks with
lots of small files is going to be hideous.
--
Erik Trimble
Java System Support
Mailstop: usca22-123
Phone: x17195
Santa Clara, CA
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss