On Fri, Apr 17, 2009 at 12:25 PM, Richard Elling <richard.ell...@gmail.com>wrote:
> Drew Balfour wrote: > >> Now I wonder where that error came from. It was just a single checksum >>>>>> error. It couldn't go away with an earlier scrub, and seemingly left no >>>>>> traces of badness on the drive. Something serious? At least it looks a >>>>>> tad >>>>>> contradictory: "Applications are unaffected.", it is unrecoverable, and >>>>>> once >>>>>> cleared, there is no error left. >>>>>> >>>>> >> What happens if you rescrub the pool after clearing the errors? If zfs has >> reused whatever was causing the issue, then it shouldn't be surprising that >> the error will show up again. >> > > Are you assuming that bad disk blocks are returned to the free pool? > This is more of a problem for file systems with pre-allocated metadata, > such as UFS. In UFS, if a sector in a superblock copy goes bad, it > will still be reused. In ZFS, metadata is COW and redundant, so > there is no forced re-use of disk blocks (except for the uberblocks > which are 4x redundant and use 128-slot circular queues). > > >> Could you propose alternate wording? >>> >> >> My $.02, but the wording in the error message is rather obtuse. >> "Unrecoverable error" indicates to me that something was lost; technically >> this is true, but zfs was able to replicate the data from another source. >> This is not all that clear from the error: >> >> status: One or more devices has experienced an unrecoverable error. An >> attempt was made to correct the error. Applications are >> unaffected. >> >> This doesn't indicate if the attempt was successful or not. We all know it >> was, because if it wasn't, we'd >> (a) see another error instead and/or (b) see something other than "errors: >> No known data errors". But, unless you know zfs well enough to make that >> leap, you're left wondering what actually happened. >> >> Granted, the 'verbose' error page (http://www.sun.com/msg/ZFS-8000-9P) >> does a much better job of explaining. However, confusing terse error >> messages are never good, and asking the user to go look stuff up in order to >> understand isn't good either. Also, the verbose error page also doesn't >> explain that despite not having a replicated configuration, metadata is >> replicated and so errors can be recovered from a seemingly 'unrecoverable' >> state. >> >> Does anyone know why it's "applications" and not "data"? >> >> Perhaps something like: >> >> status: One or more devices has experienced an error. A successful attempt >> to >> correct the error was made using a replicated copy of the data. >> Data on the pool is unaffected. >> > > I think this is on the right track. But the repair method, "replicated > copy > of the data," should be more vague because there are other ways to > repair data. > > Does anyone else have better wording? > -- richard > > Unless you want to have a different response for each of the repair methods, I'd just drop that part: status: One or more devices has experienced an error. The error has been automatically corrected by zfs. Data on the pool is unaffected. I suppose you could do a "for more information please contact Sun" or something along those lines as well? --Tim (my reply to all skills have been suffering lately, sorry Richard).
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss