Drew Balfour wrote:
Now I wonder where that error came from. It was just a single
checksum error. It couldn't go away with an earlier scrub, and
seemingly left no traces of badness on the drive. Something
serious? At least it looks a tad contradictory: "Applications are
unaffected.", it is unrecoverable, and once cleared, there is no
error left.
What happens if you rescrub the pool after clearing the errors? If zfs
has reused whatever was causing the issue, then it shouldn't be
surprising that the error will show up again.
Are you assuming that bad disk blocks are returned to the free pool?
This is more of a problem for file systems with pre-allocated metadata,
such as UFS. In UFS, if a sector in a superblock copy goes bad, it
will still be reused. In ZFS, metadata is COW and redundant, so
there is no forced re-use of disk blocks (except for the uberblocks
which are 4x redundant and use 128-slot circular queues).
Could you propose alternate wording?
My $.02, but the wording in the error message is rather obtuse.
"Unrecoverable error" indicates to me that something was lost;
technically this is true, but zfs was able to replicate the data from
another source. This is not all that clear from the error:
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are
unaffected.
This doesn't indicate if the attempt was successful or not. We all
know it was, because if it wasn't, we'd
(a) see another error instead and/or (b) see something other than
"errors: No known data errors". But, unless you know zfs well enough
to make that leap, you're left wondering what actually happened.
Granted, the 'verbose' error page (http://www.sun.com/msg/ZFS-8000-9P)
does a much better job of explaining. However, confusing terse error
messages are never good, and asking the user to go look stuff up in
order to understand isn't good either. Also, the verbose error page
also doesn't explain that despite not having a replicated
configuration, metadata is replicated and so errors can be recovered
from a seemingly 'unrecoverable' state.
Does anyone know why it's "applications" and not "data"?
Perhaps something like:
status: One or more devices has experienced an error. A successful
attempt to
correct the error was made using a replicated copy of the data.
Data on the pool is unaffected.
I think this is on the right track. But the repair method, "replicated
copy
of the data," should be more vague because there are other ways to
repair data.
Does anyone else have better wording?
-- richard
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss