Re: [zfs-discuss] How recoverable is an 'unrecoverable error'?

Richard Elling Fri, 17 Apr 2009 10:26:37 -0700

Drew Balfour wrote:

Now I wonder where that error came from. It was just a singlechecksum error. It couldn't go away with an earlier scrub, andseemingly left no traces of badness on the drive. Somethingserious? At least it looks a tad contradictory: "Applications areunaffected.", it is unrecoverable, and once cleared, there is noerror left.
What happens if you rescrub the pool after clearing the errors? If zfshas reused whatever was causing the issue, then it shouldn't besurprising that the error will show up again.


Are you assuming that bad disk blocks are returned to the free pool?
This is more of a problem for file systems with pre-allocated metadata,
such as UFS.  In UFS, if a sector in a superblock copy goes bad, it
will still be reused.  In ZFS, metadata is COW and redundant, so
there is no forced re-use of disk blocks (except for the uberblocks
which are 4x redundant and use 128-slot circular queues).

Could you propose alternate wording?
My $.02, but the wording in the error message is rather obtuse."Unrecoverable error" indicates to me that something was lost;technically this is true, but zfs was able to replicate the data fromanother source. This is not all that clear from the error:
status: One or more devices has experienced an unrecoverable error.  An
attempt was made to correct the error. Applications areunaffected.
This doesn't indicate if the attempt was successful or not. We allknow it was, because if it wasn't, we'd(a) see another error instead and/or (b) see something other than"errors: No known data errors". But, unless you know zfs well enoughto make that leap, you're left wondering what actually happened.
Granted, the 'verbose' error page (http://www.sun.com/msg/ZFS-8000-9P)does a much better job of explaining. However, confusing terse errormessages are never good, and asking the user to go look stuff up inorder to understand isn't good either. Also, the verbose error pagealso doesn't explain that despite not having a replicatedconfiguration, metadata is replicated and so errors can be recoveredfrom a seemingly 'unrecoverable' state.
Does anyone know why it's "applications" and not "data"?

Perhaps something like:
status: One or more devices has experienced an error. A successfulattempt to
        correct the error was made using a replicated copy of the data.
        Data on the pool is unaffected.

I think this is on the right track. But the repair method, "replicatedcopy

of the data," should be more vague because there are other ways to
repair data.

Does anyone else have better wording?
-- richard

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] How recoverable is an 'unrecoverable error'?

Reply via email to