Drew Balfour wrote:
Now I wonder where that error came from. It was just a single checksum error. It couldn't go away with an earlier scrub, and seemingly left no traces of badness on the drive. Something serious? At least it looks a tad contradictory: "Applications are unaffected.", it is unrecoverable, and once cleared, there is no error left.

What happens if you rescrub the pool after clearing the errors? If zfs has reused whatever was causing the issue, then it shouldn't be surprising that the error will show up again.

Are you assuming that bad disk blocks are returned to the free pool?
This is more of a problem for file systems with pre-allocated metadata,
such as UFS.  In UFS, if a sector in a superblock copy goes bad, it
will still be reused.  In ZFS, metadata is COW and redundant, so
there is no forced re-use of disk blocks (except for the uberblocks
which are 4x redundant and use 128-slot circular queues).


Could you propose alternate wording?

My $.02, but the wording in the error message is rather obtuse. "Unrecoverable error" indicates to me that something was lost; technically this is true, but zfs was able to replicate the data from another source. This is not all that clear from the error:

status: One or more devices has experienced an unrecoverable error.  An
attempt was made to correct the error. Applications are unaffected.

This doesn't indicate if the attempt was successful or not. We all know it was, because if it wasn't, we'd (a) see another error instead and/or (b) see something other than "errors: No known data errors". But, unless you know zfs well enough to make that leap, you're left wondering what actually happened.

Granted, the 'verbose' error page (http://www.sun.com/msg/ZFS-8000-9P) does a much better job of explaining. However, confusing terse error messages are never good, and asking the user to go look stuff up in order to understand isn't good either. Also, the verbose error page also doesn't explain that despite not having a replicated configuration, metadata is replicated and so errors can be recovered from a seemingly 'unrecoverable' state.

Does anyone know why it's "applications" and not "data"?

Perhaps something like:

status: One or more devices has experienced an error. A successful attempt to
        correct the error was made using a replicated copy of the data.
        Data on the pool is unaffected.

I think this is on the right track. But the repair method, "replicated copy
of the data," should be more vague because there are other ways to
repair data.

Does anyone else have better wording?
-- richard

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to