Re: [zfs-discuss] How recoverable is an 'unrecoverable error'?

Drew Balfour Thu, 16 Apr 2009 19:58:33 -0700

Uwe Dippel wrote:

If it was (successful), that would have been something. It wasn't.

It was; zfs successfully repaired the data, as is evidenced by the lack oferrors in the status output:


errors: No known data errors

'status' brought up the 'unrecoverable error', whatever number of'scrub's I did.

Hence the misunderstanding. The scrub is telling you, rather confusingly, thatthe device has an error, but zfs has managed to work around this error andmaintain data integrity. The scrub will not 'fix' the error, as zfs can't fix,say, a bad block on your disk drive. It will, however, maintain data integrityif possible. See below for an example of what I'm trying to convey.

"Determine if the device needs to be replaced, and clear the errors
 using 'zpool clear' or replace the device with 'zpool replace'. "
This does sound scary, at least to me. How to 'determine if the deviceneeds to be replaced'?
Should I 'clear' or 'replace'?

It depends on what caused the error. For example, if I have a mirrored pool andaccidentally format one side of the mirror, zpool status will show you theerrors and leave it up to you.


For example:

# zpool create swim mirror c4t1d0s0 c4t1d0s1
# zpool status
  pool: swim
 state: ONLINE
 scrub: none requested
config:

        NAME          STATE     READ WRITE CKSUM
        swim          ONLINE       0     0     0
          mirror      ONLINE       0     0     0
            c4t1d0s0  ONLINE       0     0     0
            c4t1d0s1  ONLINE       0     0     0

errors: No known data errors

# dd if=/dev/zero of=/dev/dsk/c4t1d0s0 bs=1024x1024 skip=5 count=50
50+0 records in
50+0 records out

!!oh no, I just zero'd out half of one of my mirror devices!!

# zpool scrub swim

# zpool status
  pool: swim
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: scrub completed after 0h0m with 0 errors on Thu Apr 16 18:52:28 2009
config:

        NAME          STATE     READ WRITE CKSUM
        swim          DEGRADED     0     0     0
          mirror      DEGRADED     0     0     0
            c4t1d0s0  DEGRADED     0     0    87  too many errors
            c4t1d0s1  ONLINE       0     0     0

errors: No known data errors

!!Since I didn't actually have any data on the pool, the only errors were!!metadata checksum errors.

The confusion here is that in the above output, "error" has different meaningsdepending on its context.


"One or more devices has experienced an unrecoverable error."

In this context, "error" refers to zfs reading data off the disk, and findingthat the checksum doesn't match (or in this case, actually exist). zfs has noidea why the checksum doesn't match; it could be a drive error, a driver error,a user caused error, bad bits on the bus, whatever. zfs cannot correct theseerrors, any more than any software can fix any hardware error. We do know thatwhatever the error was, we didn't get an associated "I/O Error" from the drive,as that column is zero. So the drive doesn't even know there's an error!


"An attempt was made to correct the error."

In this context, "error" refers to the actual bad checksum. zfs can fix this. Inthis case, by either reading from the other side of the mirror or from thereplicated metadata. It should be noted that this attempt was successful, as zfswas able to maintain data integrity. The is implied in the error, confusingly.



"scrub: scrub completed after 0h0m with 0 errors on Thu Apr 16 18:52:28 2009"
"errors: No known data errors"

In this context, "error" refers to uncorrectable, unrecoverable data corruption.

There is a problem with your data, and zfs was unable to fix it. In this case,there were none of these, which is a good thing.


Now, as to whether to replace or clear...

In this particular case, I know what caused the error. Me. I know the disk isfine. I can simply:


# zpool clear swim
# zpool status
  pool: swim
 state: ONLINE
 scrub: scrub completed after 0h0m with 0 errors on Thu Apr 16 18:52:28 2009
config:

        NAME          STATE     READ WRITE CKSUM
        swim          ONLINE       0     0     0
          mirror      ONLINE       0     0     0
            c4t1d0s0  ONLINE       0     0     0
            c4t1d0s1  ONLINE       0     0     0

errors: No known data errors

zfs clear simply zeros the device error counters. I know there was nothing wrongwith the device, so I can forget about those errors.

If I didn't know the cause of the error, and suspected a bad disk, I'd probablychoose to replace the device.

In the end, it needed a 'clear' and that one CKSUM error went away. Asit seems without further consequences and a fully sane disk.Don't call that 'self-healing'. This is an arcane method demandingplenty of user activity, interaction, reading-up, etc.

zfs clear will _always_ clear _all_ errors. It's a sysadmin's choice to clearthe error counters. You don't have to clear the errors; if you'd rather keeptrack of all of the errors over the lifetime of the pool, go right ahead.

# zpool status | egrep "errors: |c4t1d0s0"

       c4t1d0s0  ONLINE       0     0     0
errors: No known data errors
# dd if=/dev/zero of=/dev/dsk/c4t1d0s0 bs=1024x1024 skip=5 count=50
# zpool scrub swim

# zpool status | egrep "errors: |c4t1d0s0"

       c4t1d0s0  DEGRADED       0     0 652    too many errors
errors: No known data errors
# dd if=/dev/zero of=/dev/dsk/c4t1d0s0 bs=1024x1024 skip=5 count=50
# zpool scrub swim

# zpool status | egrep "errors: |c4t1d0s0"

       c4t1d0s0  DEGRADED       0     0 1.27K  too many errors
errors: No known data errors

You can zpool clear at any time, or you can never do it.

Of course, if you don't know the cause of the errors, clearing probably isn'tthe best course of action, if you value your data.

Replacing the device will also reset the counters, obviously, as the old deviceis removed and the new device (hopefully) has no problems:


# zpool status| grep c4t1d0s0
c4t1d0s0  DEGRADED     0     0    84  too many errors
# zpool replace swim c4t1d0s0 c4t1d0s3
# zpool status | grep c4t1d0s3
            c4t1d0s3  ONLINE       0     0     0  83.5K resilvered

It seems most in here don't run production servers. A term like'unrecoverable' sends me into a state of frenzy.

Personally, I agree. I think the wording of the current message is confusing atbest, and panic inducing at worst.

If this was the case, Toby, I wouldn't want to have to type anything. I'd ratherhave the systemdetecting the situation on its own accord, trying the redundant metadata(we do have snapshots, don't we!), and scrub on its very own. At theend, a mail to root would be in order, informing me that an error hasbeen corrected and no data compromised at all.

That's actually exactly what happened, minus the email. In your case, and in allthe examples above, the "zpool scrub" is entirely unnecessary. I ran it in theexamples to force zfs to examine the pool and find the errors. If I'd left italone, and done things to the file system, it would have found the errors anddealt with them as the data was accessed. In other words, I could have done:


!!put some data on the pool:
# dd if=/dev/urandom of=/swim/a bs=1024x1024 count=60
60+0 records in
60+0 records out
!!do something foolish
#dd if=/dev/zero of=/dev/dsk/c4t1d0s0 bs=1024x1024 skip=5 count=50
50+0 records in
50+0 records out
!!use the data on the pool

# dd if=/swim/a of=/b bs=1024x102460+0 records in

60+0 records out

# zpool statuspool: swim

 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: none requested
config:

        NAME          STATE     READ WRITE CKSUM
        swim          ONLINE       0     0     0
          mirror      ONLINE       0     0     0
            c4t1d0s0  ONLINE       0     0    14
            c4t1d0s1  ONLINE       0     0     0

errors: No known data errors

Now, if you'd like zfs to email you when it finds errors, that's easy enough todo, since zfs helpfully logs failures with the fma daemon. By default that dumpsto /var/adm/messages, but sending an email to root, or paging you would betrivial to implement:


   Apr 16 19:28:53 pcandle3 fmd: [ID 441519 daemon.error] SUNW-MSG-ID:
                     ZFS-8000-GH, TYPE: Fault, VER: 1, SEVERITY: Major
   Apr 16 19:28:53 pcandle3 EVENT-TIME: Thu Apr 16 19:28:53 PDT 2009
   Apr 16 19:28:53 pcandle3 PLATFORM: Sun Fire X4200 M2, CSN: 0718BD03B4
                     , HOSTNAME: pcandle3
   Apr 16 19:28:53 pcandle3 SOURCE: zfs-diagnosis, REV: 1.0
   Apr 16 19:28:53 pcandle3 EVENT-ID: cd6fe5bc-9137-c32a-c811-ba98dac5dbe9
   Apr 16 19:28:53 pcandle3 DESC: The number of checksum errors associated
                     with a ZFS device
   Apr 16 19:28:53 pcandle3 exceeded acceptable levels.  Refer to
                     http://sun.com/msg/ZFS-8000-GH for more information.
   Apr 16 19:28:53 pcandle3 AUTO-RESPONSE: The device has been marked as
                     degraded.  An attempt
   Apr 16 19:28:53 pcandle3 will be made to activate a hot spare if available.
   Apr 16 19:28:53 pcandle3 IMPACT: Fault tolerance of the pool may be
                     compromised.
   Apr 16 19:28:53 pcandle3 REC-ACTION: Run 'zpool status -x' and replace the
                     bad device.

However, I think we can all agree that _not_ telling you that there wereproblems is not a good idea.

I think the argument against automatically scrubbing the entire pool is thatscrubs are very I/O intensive, and that could negatively impact performance.Assuming the pool is redundantly configured, there's no danger of losing data,and any bad data or checksums will be corrected on-the-fly.

Of course, if it were my system and I got random, unexplained checksum errors,I'd probably scrub the pool, performance be damned.


-Drew


_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] How recoverable is an 'unrecoverable error'?

Reply via email to