Re: [zfs-discuss] more ZFS recovery

Richard Elling Thu, 07 Aug 2008 12:18:54 -0700

[I think Miles and I seem to be talking about two different topics]

Miles Nordin wrote:
>>>>>> "re" == Richard Elling <[EMAIL PROTECTED]> writes:
>>>>>>             
>
>     re>     If your pool is not redundant, the chance that data
>     re> corruption can render some or all of your data inaccessible is
>     re> always present.
>
> 1. data corruption != unclean shutdown
>


Agree.  One is a state, the other is an event.

> 2. other filesystems do not need a mirror to recover from unclean
>    shutdown.  They only need it for when disks fail, or for when disks
>    misremember their contents (silent corruption, as in NetApp paper).
>   

Agree.  ZFS fits this category.

>    I would call data corruption and silent corruption the same thing:
>    what the CKSUM column was _supposed_ to count, though not in fact
>    the only thing it counts.
>   

Agree.  Data corruption takes two forms: detectable and undetectable
(aka silent).

> 3. saying ZFS needs a mirror to recover from unclean shutdown does not
>    agree with the claim ``always consistent on the disk''
>   

Disagree. We test ZFS with unclean shutdowns all of the time and
it works fine.  However, if there is data corruption, then it may be
possible that ZFS cannot recover unless there is a surviving copy
of the good data.  This is what mirrors and raidz do.

> 4. I'm not sure exactly your position.  Before you were saying what
>    Erik warned about doesn't happen, because there's no CR, and Tom
>    must be confused too.  Now you're saying of course it happens,
>    ZFS's claims of ``always consistent on disk'' count for nothing
>    unless you have pool redundancy.
>   

No, I'm saying that data corruption without a surviving good copy
of the data may lead to an unrecoverable data condition.

>
> And that is exactly what I said to start with:
>
>     re> In general, ZFS can only repair conditions for which it owns
>     re> data redundancy.
>
>      c> If that's really the excuse for this situation, then ZFS is
>      c> not ``always consistent on the disk'' for single-VDEV pools.
>
> that is the take-home message?
>   

ZFS is always consistent on disk.  If there is data corruption, then
all bets are off, no matter what file system you choose.

> If so, it still leaves me with the concern, what if the breaking of
> one component in a mirrored vdev takes my system down uncleanly?  This
> seems like a really plausible failure mode (as Tom said, ``the
> inevitable kernel panic'').
>   

Tom has not provided any data as to why the kernel panic'ed.
Panic messages, as a minimum, would be enlightening.

> In that case, I no longer have any redundancy when the system boots
> back up.  If ZFS calls the inconsistent states through which it
> apparently sometimes transitions pools ``data corruption'' and depends
> on redundancy to recover from them, then isn't it extremely dangerous
> to remove power or SAN connectivity from any DEGRADED pool?  The pool
> should be rebuilt onto a hot spare IMMEDIATELY so that it's ONLINE as
> soon as possible, because if ZFS loses power with a DEGRADED pool all
> bets are off.
>   

In Tom's case, ZFS was not configured such that it could rebuild a
failed vdev on a hot spare.

> If this DEGRADED-pool unclean shutdown is, as you say, a completely
> different scenario from single-vdev pools that isn't dangerous and has
> no trouble with ZFS corruption, then no one should ever run a
> single-vdev pool.  We should instead run mirrored vdevs that are
> always DEGRADED, since this configuration looks identical to
> everything outside ZFS but supposedly magically avoids the issue.  If
> only we had some way to attach to vdevs fake mirror components that
> immediately get marked FAULTED then we can avoid the corruption risk.
> But, that's clearly absurd!
>   

Fast, reliable, inexpensive: pick two.

> so, let's say ZFS's requirement is, as we seem to be describing it:
> might lose the whole pool if your kernel panics or you pull the power
> cord in a situation without redundancy.  Then I think this is an
> extremely serious issue, even for redundant pools.  

Agree.  But in Tom's case, there is no proof that the fault
condition is cleared.  The fact that zpool import fails with an
I/O error is a strong indicator that the fault is still present.
We do not yet know if there is a data corruption issue or not.

> It is very
> plausible that a machine will panic or lose power during a resilver.
>   

I think this is an unfounded statement.  There are many cases where
resilvers complete successfully.  In our data reliability models, we have
a parameter for the probability of [un]successful resilver, but all of our
research in determining a value for this centers around actual data loss
or corruption in the devices.  Do you have research that points to
another cause?

> And if, on the other hand, ZFS doesn't transition disks through
> inconsistent states and then excuse itself calling what it did ``data
> corruption'' when it bites you after an unclean shutdown, then what
> happened to Erik and Tom?  
>   

I have no idea what happened to Erik.  His post makes claims of
loss followed by claims of unfixed, known problems, but no real
pointer to bugids.  Hence my comment about his post being of
the "your baby is ugly" variety.  At least point out the mole in the
middle of the forehead, aka CR???

> It seems to me it is ZFS's fault and can't be punted off to the
> administrator's ``asking for it.''
>   

I think the jury is still out.  Tom needs to complete his tests and
provide the messages and FMA notifications so that a root cause
can be determined.

Meanwhile, we'll work on putting together some docs on how to
proceed when your pool can't be imported, because it would be
good to have.  And, as Anton notes, we can't scrub the pool if
we can't import the pool.
 -- richard

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] more ZFS recovery

Reply via email to