> -----Original Message----- > From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Edward Ned Harvey > Sent: Monday, December 20, 2010 11:46 AM > To: 'Lanky Doodle'; zfs-discuss@opensolaris.org > Subject: Re: [zfs-discuss] A few questions > > > From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > > boun...@opensolaris.org] On Behalf Of Lanky Doodle > > > > > I believe Oracle is aware of the problem, but most of > > > the core ZFS team has left. And of course, a fix for > > > Oracle Solaris no longer means a fix for the rest of > > > us. > > > > OK, that is a bit concerning then. As good as ZFS may be, i'm not sure I > want > > to committ to a file system that is 'broken' and may not be fully fixed, > if at all. > > ZFS is not "broken." It is, however, a weak spot, that resilver is very > inefficient. For example: > > On my server, which is made up of 10krpm SATA drives, 1TB each... My > drives > can each sustain 1Gbit/sec sequential read/write. This means, if I needed > to resilver the entire drive (in a mirror) sequentially, it would take ... > 8,000 sec = 133 minutes. About 2 hours. In reality, I have ZFS mirrors, > and disks are around 70% full, and resilver takes 12-14 hours. > > So although resilver is "broken" by some standards, it is bounded, and you > can limit it to something which is survivable, by using mirrors instead of > raidz. For most people, even using 5-disk, or 7-disk raidzN will still be > fine. But you start getting unsustainable if you get up to 21-disk radiz3 > for example.
This argument keeps coming up on the list, but I don't see where anyone has made a good suggestion about whether this can even be 'fixed' or how it would be done. As I understand it, you have two basic types of array reconstruction: in a mirror you can make a block-by-block copy and that's easy, but in a parity array you have to perform a calculation on the existing data and/or existing parity to reconstruct the missing piece. This is pretty easy when you can guarantee that all your stripes are the same width, start/end on the same sectors/boundaries/whatever and thus know a piece of them lives on all drives in the set. I don't think this is possible with ZFS since we have variable stripe width. A failed disk d may or may not contain data from stripe s (or transaction t). This information has to be discovered by looking at the transaction records. Right? Can someone speculate as to how you could rebuild a variable stripe width array without replaying all the available transactions? I am no filesystem engineer but I can't wrap my head around how this could be handled any better than it already is. I've read that resilvering is throttled - presumably to keep performance degradation to a minimum during the process - maybe this could be a tunable (e.g. priority: low, normal, high)? Do we know if resilvers on a mirror are actually handled differently from those on a raidz? Sorry if this has already been explained. I think this is an issue that everyone who uses ZFS should understand completely before jumping in, because the behavior (while not 'wrong') is clearly NOT the same as with more conventional arrays. -Will _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss