Uwe Dippel wrote:

If it was (successful), that would have been something. It wasn't.

It was; zfs successfully repaired the data, as is evidenced by the lack of errors in the status output:

errors: No known data errors

'status' brought up the 'unrecoverable error', whatever number of 'scrub's I did.

Hence the misunderstanding. The scrub is telling you, rather confusingly, that the device has an error, but zfs has managed to work around this error and maintain data integrity. The scrub will not 'fix' the error, as zfs can't fix, say, a bad block on your disk drive. It will, however, maintain data integrity if possible. See below for an example of what I'm trying to convey.

"Determine if the device needs to be replaced, and clear the errors
 using 'zpool clear' or replace the device with 'zpool replace'. "
This does sound scary, at least to me. How to 'determine if the device needs to be replaced'?
Should I 'clear' or 'replace'?

It depends on what caused the error. For example, if I have a mirrored pool and accidentally format one side of the mirror, zpool status will show you the errors and leave it up to you.

For example:

# zpool create swim mirror c4t1d0s0 c4t1d0s1
# zpool status
  pool: swim
 state: ONLINE
 scrub: none requested
config:

        NAME          STATE     READ WRITE CKSUM
        swim          ONLINE       0     0     0
          mirror      ONLINE       0     0     0
            c4t1d0s0  ONLINE       0     0     0
            c4t1d0s1  ONLINE       0     0     0

errors: No known data errors

# dd if=/dev/zero of=/dev/dsk/c4t1d0s0 bs=1024x1024 skip=5 count=50
50+0 records in
50+0 records out

!!oh no, I just zero'd out half of one of my mirror devices!!

# zpool scrub swim

# zpool status
  pool: swim
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: scrub completed after 0h0m with 0 errors on Thu Apr 16 18:52:28 2009
config:

        NAME          STATE     READ WRITE CKSUM
        swim          DEGRADED     0     0     0
          mirror      DEGRADED     0     0     0
            c4t1d0s0  DEGRADED     0     0    87  too many errors
            c4t1d0s1  ONLINE       0     0     0

errors: No known data errors

!!Since I didn't actually have any data on the pool, the only errors were !!metadata checksum errors.

The confusion here is that in the above output, "error" has different meanings depending on its context.

"One or more devices has experienced an unrecoverable error."

In this context, "error" refers to zfs reading data off the disk, and finding that the checksum doesn't match (or in this case, actually exist). zfs has no idea why the checksum doesn't match; it could be a drive error, a driver error, a user caused error, bad bits on the bus, whatever. zfs cannot correct these errors, any more than any software can fix any hardware error. We do know that whatever the error was, we didn't get an associated "I/O Error" from the drive, as that column is zero. So the drive doesn't even know there's an error!

"An attempt was made to correct the error."

In this context, "error" refers to the actual bad checksum. zfs can fix this. In this case, by either reading from the other side of the mirror or from the replicated metadata. It should be noted that this attempt was successful, as zfs was able to maintain data integrity. The is implied in the error, confusingly.


"scrub: scrub completed after 0h0m with 0 errors on Thu Apr 16 18:52:28 2009"
"errors: No known data errors"

In this context, "error" refers to uncorrectable, unrecoverable data corruption.
There is a problem with your data, and zfs was unable to fix it. In this case, there were none of these, which is a good thing.

Now, as to whether to replace or clear...

In this particular case, I know what caused the error. Me. I know the disk is fine. I can simply:

# zpool clear swim
# zpool status
  pool: swim
 state: ONLINE
 scrub: scrub completed after 0h0m with 0 errors on Thu Apr 16 18:52:28 2009
config:

        NAME          STATE     READ WRITE CKSUM
        swim          ONLINE       0     0     0
          mirror      ONLINE       0     0     0
            c4t1d0s0  ONLINE       0     0     0
            c4t1d0s1  ONLINE       0     0     0

errors: No known data errors

zfs clear simply zeros the device error counters. I know there was nothing wrong with the device, so I can forget about those errors.

If I didn't know the cause of the error, and suspected a bad disk, I'd probably choose to replace the device.

In the end, it needed a 'clear' and that one CKSUM error went away. As it seems without further consequences and a fully sane disk. Don't call that 'self-healing'. This is an arcane method demanding plenty of user activity, interaction, reading-up, etc.

zfs clear will _always_ clear _all_ errors. It's a sysadmin's choice to clear the error counters. You don't have to clear the errors; if you'd rather keep track of all of the errors over the lifetime of the pool, go right ahead.

# zpool status | egrep "errors: |c4t1d0s0"
       c4t1d0s0  ONLINE       0     0     0
errors: No known data errors
# dd if=/dev/zero of=/dev/dsk/c4t1d0s0 bs=1024x1024 skip=5 count=50
# zpool scrub swim
# zpool status | egrep "errors: |c4t1d0s0"
       c4t1d0s0  DEGRADED       0     0 652    too many errors
errors: No known data errors
# dd if=/dev/zero of=/dev/dsk/c4t1d0s0 bs=1024x1024 skip=5 count=50
# zpool scrub swim
# zpool status | egrep "errors: |c4t1d0s0"
       c4t1d0s0  DEGRADED       0     0 1.27K  too many errors
errors: No known data errors

You can zpool clear at any time, or you can never do it.
Of course, if you don't know the cause of the errors, clearing probably isn't the best course of action, if you value your data.

Replacing the device will also reset the counters, obviously, as the old device is removed and the new device (hopefully) has no problems:

# zpool status| grep c4t1d0s0
c4t1d0s0  DEGRADED     0     0    84  too many errors
# zpool replace swim c4t1d0s0 c4t1d0s3
# zpool status | grep c4t1d0s3
            c4t1d0s3  ONLINE       0     0     0  83.5K resilvered

It seems most in here don't run production servers. A term like 'unrecoverable' sends me into a state of frenzy.

Personally, I agree. I think the wording of the current message is confusing at best, and panic inducing at worst.

If this was the case, Toby, I wouldn't want to have to type anything. I'd rather have the system detecting the situation on its own accord, trying the redundant metadata (we do have snapshots, don't we!), and scrub on its very own. At the end, a mail to root would be in order, informing me that an error has been corrected and no data compromised at all.

That's actually exactly what happened, minus the email. In your case, and in all the examples above, the "zpool scrub" is entirely unnecessary. I ran it in the examples to force zfs to examine the pool and find the errors. If I'd left it alone, and done things to the file system, it would have found the errors and dealt with them as the data was accessed. In other words, I could have done:

!!put some data on the pool:
# dd if=/dev/urandom of=/swim/a bs=1024x1024 count=60
60+0 records in
60+0 records out
!!do something foolish
#dd if=/dev/zero of=/dev/dsk/c4t1d0s0 bs=1024x1024 skip=5 count=50
50+0 records in
50+0 records out
!!use the data on the pool
# dd if=/swim/a of=/b bs=1024x1024 60+0 records in
60+0 records out
# zpool status pool: swim
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: none requested
config:

        NAME          STATE     READ WRITE CKSUM
        swim          ONLINE       0     0     0
          mirror      ONLINE       0     0     0
            c4t1d0s0  ONLINE       0     0    14
            c4t1d0s1  ONLINE       0     0     0

errors: No known data errors

Now, if you'd like zfs to email you when it finds errors, that's easy enough to do, since zfs helpfully logs failures with the fma daemon. By default that dumps to /var/adm/messages, but sending an email to root, or paging you would be trivial to implement:

   Apr 16 19:28:53 pcandle3 fmd: [ID 441519 daemon.error] SUNW-MSG-ID:
                     ZFS-8000-GH, TYPE: Fault, VER: 1, SEVERITY: Major
   Apr 16 19:28:53 pcandle3 EVENT-TIME: Thu Apr 16 19:28:53 PDT 2009
   Apr 16 19:28:53 pcandle3 PLATFORM: Sun Fire X4200 M2, CSN: 0718BD03B4
                     , HOSTNAME: pcandle3
   Apr 16 19:28:53 pcandle3 SOURCE: zfs-diagnosis, REV: 1.0
   Apr 16 19:28:53 pcandle3 EVENT-ID: cd6fe5bc-9137-c32a-c811-ba98dac5dbe9
   Apr 16 19:28:53 pcandle3 DESC: The number of checksum errors associated
                     with a ZFS device
   Apr 16 19:28:53 pcandle3 exceeded acceptable levels.  Refer to
                     http://sun.com/msg/ZFS-8000-GH for more information.
   Apr 16 19:28:53 pcandle3 AUTO-RESPONSE: The device has been marked as
                     degraded.  An attempt
   Apr 16 19:28:53 pcandle3 will be made to activate a hot spare if available.
   Apr 16 19:28:53 pcandle3 IMPACT: Fault tolerance of the pool may be
                     compromised.
   Apr 16 19:28:53 pcandle3 REC-ACTION: Run 'zpool status -x' and replace the
                     bad device.

However, I think we can all agree that _not_ telling you that there were problems is not a good idea.

I think the argument against automatically scrubbing the entire pool is that scrubs are very I/O intensive, and that could negatively impact performance. Assuming the pool is redundantly configured, there's no danger of losing data, and any bad data or checksums will be corrected on-the-fly.

Of course, if it were my system and I got random, unexplained checksum errors, I'd probably scrub the pool, performance be damned.

-Drew


_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to