Re: [zfs-discuss] more ZFS recovery

Miles Nordin Wed, 06 Aug 2008 22:30:59 -0700

>>>>> "re" == Richard Elling <[EMAIL PROTECTED]> writes:


    re>     If your pool is not redundant, the chance that data
    re> corruption can render some or all of your data inaccessible is
    re> always present.

1. data corruption != unclean shutdown

2. other filesystems do not need a mirror to recover from unclean
   shutdown.  They only need it for when disks fail, or for when disks
   misremember their contents (silent corruption, as in NetApp paper).

   I would call data corruption and silent corruption the same thing:
   what the CKSUM column was _supposed_ to count, though not in fact
   the only thing it counts.

3. saying ZFS needs a mirror to recover from unclean shutdown does not
   agree with the claim ``always consistent on the disk''

4. I'm not sure exactly your position.  Before you were saying what
   Erik warned about doesn't happen, because there's no CR, and Tom
   must be confused too.  Now you're saying of course it happens,
   ZFS's claims of ``always consistent on disk'' count for nothing
   unless you have pool redundancy.


And that is exactly what I said to start with:

    re> In general, ZFS can only repair conditions for which it owns
    re> data redundancy.

     c> If that's really the excuse for this situation, then ZFS is
     c> not ``always consistent on the disk'' for single-VDEV pools.

that is the take-home message?

If so, it still leaves me with the concern, what if the breaking of
one component in a mirrored vdev takes my system down uncleanly?  This
seems like a really plausible failure mode (as Tom said, ``the
inevitable kernel panic'').

In that case, I no longer have any redundancy when the system boots
back up.  If ZFS calls the inconsistent states through which it
apparently sometimes transitions pools ``data corruption'' and depends
on redundancy to recover from them, then isn't it extremely dangerous
to remove power or SAN connectivity from any DEGRADED pool?  The pool
should be rebuilt onto a hot spare IMMEDIATELY so that it's ONLINE as
soon as possible, because if ZFS loses power with a DEGRADED pool all
bets are off.

If this DEGRADED-pool unclean shutdown is, as you say, a completely
different scenario from single-vdev pools that isn't dangerous and has
no trouble with ZFS corruption, then no one should ever run a
single-vdev pool.  We should instead run mirrored vdevs that are
always DEGRADED, since this configuration looks identical to
everything outside ZFS but supposedly magically avoids the issue.  If
only we had some way to attach to vdevs fake mirror components that
immediately get marked FAULTED then we can avoid the corruption risk.
But, that's clearly absurd!

so, let's say ZFS's requirement is, as we seem to be describing it:
might lose the whole pool if your kernel panics or you pull the power
cord in a situation without redundancy.  Then I think this is an
extremely serious issue, even for redundant pools.  It is very
plausible that a machine will panic or lose power during a resilver.

And if, on the other hand, ZFS doesn't transition disks through
inconsistent states and then excuse itself calling what it did ``data
corruption'' when it bites you after an unclean shutdown, then what
happened to Erik and Tom?  

It seems to me it is ZFS's fault and can't be punted off to the
administrator's ``asking for it.''

pgpX6I9cpJdn1.pgp
Description: PGP signature

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] more ZFS recovery

Reply via email to