Re: [zfs-discuss] Proposed idea for enhancement - damage control

Miles Nordin Wed, 17 Feb 2010 11:39:07 -0800

>>>>> "ck" == Christo Kutrovsky <kutrov...@pythian.com> writes:


    ck> I could always put "copies=2" (or more) to my important
    ck> datasets and take some risk and tolerate such a failure.

copies=2 has proven to be mostly useless in practice.

If there were a real-world device that tended to randomly flip bits,
or randomly replace swaths of LBA's with zeroes, but otherwise behave
normally (not return any errors, not slow down retrying reads, not
fail to attach), then copies=2 would be really valuable, but so far it
seems no such device exists.  If you actually explore the errors that
really happen I venture there are few to no cases copies=2 would save
you.

one case where such a device appears to exist but doesn't really, is
what I often end up doing for family/friend laptops and external USB
drives: wait for drives to start going bad, then rescue them with 'dd
conv=noerror,sync', fsck, and hope for ``most of'' the data.  copies=2
would help get more out of the rescued drive for some but not all of
the times I've done this, but there is not much point: Time Machine or
rsync backups, or NFS/iSCSI-booting, or zfs send|zfs recv replication
to a backup pool, are smarter.  I've never been recovering a
stupidly-vulnerable drive like that in a situation where I had ZFS on
it, so I'm not sure copies=2 will get used here much either.

One particular case of doom: a lot of people want to make two
unredundant vdevs and then set 'copies=2' and rely on ZFS's promise to
spread the two copies out as much as possible.  Then they expect to
import the pool with only one of the two vdev's and read ``some but
not all'' of the data---``I understand I won't get all of it but I
just want ZFS to try it's best and we'll see.''  Maybe you want to do
this instead of a mirror so you can have scratch datasets that consume
space at 1/2 the rate they would on a mirror.  Nope, nice try but
won't happen.  ZFS is full of all sorts of webbed assertions to
ratchet you safely through sane pool states that are
regression-testable and supportable so it will refuse to import a pool
that isn't vdev-complete, and no negotiation is possible on this.  The
dream is a FAQ and the answer is a clear ``No'' followed by ``you'd
better test with file vdevs next time you have such a dream.''

    ck> What are the chances for a very specific drive to fail in 2
    ck> way mirror?

This may not be what you mean, but in general single device redundancy
isn't ideal for two reasons:

 * speculation (although maybe not operational experience?) that
   modern drives are so huge that even a good drive will occasionally
   develop an unreadable spot and still be within its BER spec.  so,
   without redundancy, you cannot read for sure at all, even if all
   the drives are ``good''.

 * drives do not immediately turn red and start brrk-brrking when they
   go bad.  In the real world, they develop latent sector errors,
   which you will not discover and mark the drive bad until you scrub
   or coincidentally happen to read the file that accumulated the
   error.  It's possible for a second drive to go bad in the interval
   you're waiting to discover the first.  This usually gets retold as
   ``a drive went bad while I was resilvering!  what bad luck.  If
   only I could've resilvered faster to close this window of
   vulnerability I'd not be in such a terrible situation'' but the
   retelling's wrong: what's really happening is that resilver implies
   a scrub, so it uncovers the second bad drive you didn't yet know
   was bad at the time you discovered the first.

pgpXXJ9hwSSUa.pgp
Description: PGP signature

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Proposed idea for enhancement - damage control

Reply via email to