[zfs-discuss] ZFS Honesty after a power failure

Dennis Clarke Tue, 24 Mar 2009 10:44:33 -0700

I'm happy to see that someone else brought up this topic. I had a nasty
long power failure last night that drained the APC/UPS batteries dry.[1]
:-(


I changed the subject line somewhat because I feel that the issue is one
of honesty as opposed to reliability.

I *feel* that ZFS is reliable out past six nines ( rho=0.999999 ) flawless
for two reasons; I have never seen it fail me and I have pounded it with
some fairly offensive abuse under terrible conditions[2], and secondly
because everyone in the computer industry is trying to
steal^H^H^H^H^Himplement it into their OS of choice. There must be a
reason for that.

However, I have repeatedly run into problems when I need to boot after a
power failure. I see vdevs being marked as FAULTED regardless if there are
actually any hard errors reported by the on disk SMART Firmware. I am able
to remove these FAULTed devices temporarily and then re-insert the same
disk again and then run fine for months. Until the next long power
failure.

This is where "honestly" becomes a question because I have to question the
severity of the FAULT when I know from past experience that the disk(s) in
question can be removed and then re-inserted and life is fine for months.
Were harddisk manufacturers involved in this error message logic? :-P

A power failure, a really nice long one, happened last night and again
when I boot up I see nasty error messages.

Here is *precisely* what I saw last night :

{3} ok boot -s
Resetting ...


Sun Fire 480R, No Keyboard
Copyright 2007 Sun Microsystems, Inc.  All rights reserved.
OpenBoot 4.22.34, 16384 MB memory installed, Serial #53264354.
Ethernet address 0:3:ba:2c:bf:e2, Host ID: 832cbfe2.

Rebooting with command: boot -s
Boot device: /p...@9,600000/SUNW,q...@2/f...@0,0/d...@w21000004cfb6f0ff,0:a 
File and args: -s
SunOS Release 5.10 Version Generic_138888-03 64-bit
Copyright 1983-2008 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
Booting to milestone "milestone/single-user:default".
Hostname: jupiter
Requesting System Maintenance Mode
SINGLE USER MODE

Root password for system maintenance (control-d to bypass):
single-user privilege assigned to /dev/console.
Entering System Maintenance Mode
Mar 24 01:28:04 su: 'su root' succeeded for root on /dev/console
Sun Microsystems Inc.   SunOS 5.10      Generic January 2005
#

 /***************************************************/
 /* the very first thing I check is zpool fibre0    */
 /***************************************************/

# zpool status fibre0
  pool: fibre0
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The pool can
        still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
        pool will no longer be accessible on older software versions.
 scrub: none requested
config:

        NAME         STATE     READ WRITE CKSUM
        fibre0       ONLINE       0     0     0
          mirror     ONLINE       0     0     0
            c2t16d0  ONLINE       0     0     0
            c5t0d0   ONLINE       0     0     0
          mirror     ONLINE       0     0     0
            c5t1d0   ONLINE       0     0     0
            c2t17d0  ONLINE       0     0     0
          mirror     ONLINE       0     0     0
            c5t2d0   ONLINE       0     0     0
            c2t18d0  ONLINE       0     0     0
          mirror     ONLINE       0     0     0
            c2t20d0  ONLINE       0     0     0
            c5t4d0   ONLINE       0     0     0
          mirror     ONLINE       0     0     0
            c2t21d0  ONLINE       0     0     0
            c5t6d0   ONLINE       0     0     0
        spares
          c2t22d0    AVAIL

errors: No known data errors

 *************************************************
 * everything looks fine, okay, thank you to ZFS *
 * ... and then I try to boot to full init 3                              
               *
 *************************************************

# exit
svc.startd: Returning to milestone all.
Reading ZFS config: done.
Mounting ZFS filesystems: (1/51)

jupiter console l(51/51)
root
Password:
Last login: Sat Mar  7 19:39:00 on console
Sun Microsystems Inc.   SunOS 5.10      Generic January 2005
# zpool status fibre0
  pool: fibre0
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The pool can
        still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
        pool will no longer be accessible on older software versions.
 scrub: none requested
config:

        NAME         STATE     READ WRITE CKSUM
        fibre0       ONLINE       0     0     0
          mirror     ONLINE       0     0     0
            c2t16d0  ONLINE       0     0     0
            c5t0d0   ONLINE       0     0     0
          mirror     ONLINE       0     0     0
            c5t1d0   ONLINE       0     0     0
            c2t17d0  ONLINE       0     0     0
          mirror     ONLINE       0     0     0
            c5t2d0   ONLINE       0     0     0
            c2t18d0  ONLINE       0     0     0
          mirror     ONLINE       0     0     0
            c2t20d0  ONLINE       0     0     0
            c5t4d0   ONLINE       0     0     0
          mirror     ONLINE       0     0     0
            c2t21d0  ONLINE       0     0     0
            c5t6d0   ONLINE       0     0     0
        spares
          c2t22d0    AVAIL

errors: No known data errors


 * everything STILL looks fine, and only seconds have passed.
 * Then .. I get bombarded with SEVERITY: Major faults

#
SUNW-MSG-ID: ZFS-8000-FD, TYPE: Fault, VER: 1, SEVERITY: Major
EVENT-TIME: Tue Mar 24 01:29:00 GMT 2009
PLATFORM: SUNW,Sun-Fire-480R, CSN: -, HOSTNAME: jupiter
SOURCE: zfs-diagnosis, REV: 1.0
EVENT-ID: 3780a2dd-7381-c053-e186-8112b463c2b7
DESC: The number of I/O errors associated with a ZFS device exceeded
             acceptable levels.  Refer to http://sun.com/msg/ZFS-8000-FD
for more information.
AUTO-RESPONSE: The device has been offlined and marked as faulted.  An
attempt
             will be made to activate a hot spare if available.
IMPACT: Fault tolerance of the pool may be compromised.
REC-ACTION: Run 'zpool status -x' and replace the bad device.
SUNW-MSG-ID: ZFS-8000-FD, TYPE: Fault, VER: 1, SEVERITY: Major
EVENT-TIME: Tue Mar 24 01:29:00 GMT 2009
PLATFORM: SUNW,Sun-Fire-480R, CSN: -, HOSTNAME: jupiter
SOURCE: zfs-diagnosis, REV: 1.0
EVENT-ID: 146dad1d-f195-c2d6-c630-c1adcd58b288
DESC: The number of I/O errors associated with a ZFS device exceeded
             acceptable levels.  Refer to http://sun.com/msg/ZFS-8000-FD
for more information.
AUTO-RESPONSE: The device has been offlined and marked as faulted.  An
attempt
             will be made to activate a hot spare if available.
IMPACT: Fault tolerance of the pool may be compromised.
REC-ACTION: Run 'zpool status -x' and replace the bad device.


 ********************************************************

  I know that I have been here before after a power failure
  with similar messages. They were not entirely honest about
  the SEVERITY of the device faults.
  The faults are certainly not "Major faults"

 *********************************************************

# zpool status fibre0
  pool: fibre0
 state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
        Sufficient replicas exist for the pool to continue functioning in
a
        degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
        repaired.
 scrub: resilver in progress for 0h0m, 0.02% done, 21h7m to go
config:

        NAME           STATE     READ WRITE CKSUM
        fibre0         DEGRADED     0     0     0
          mirror       ONLINE       0     0     0
            c2t16d0    ONLINE       0     0     0
            c5t0d0     ONLINE       0     0     0
          mirror       DEGRADED     0     0     0
            c5t1d0     ONLINE       0     0     0
            spare      DEGRADED     0     0     0
              c2t17d0  FAULTED      0     0     0  too many errors
              c2t22d0  ONLINE       0     0     0
          mirror       ONLINE       0     0     0
            c5t2d0     ONLINE       0     0     0
            c2t18d0    ONLINE       0     0     0
          mirror       ONLINE       0     0     0
            c2t20d0    ONLINE       0     0     0
            c5t4d0     ONLINE       0     0     0
          mirror       ONLINE       0     0     0
            c2t21d0    ONLINE       0     0     0
            c5t6d0     ONLINE       0     0     0
        spares
          c2t22d0      INUSE     currently in use

errors: No known data errors
# Mar 24 01:29:53 jupiter ntpdate[733]: no server suitable for
synchronization found

  ***********************************************************
  * at this point I go look at my cisco routers and check my AC
  * and get things booting, I also curse my new APC gear for not
  * signalings a power failure ... but that is another story.
  ***********************************************************

So can I *trust* what I am seeing?

Do I really believe that I have a SEVERE fault in a disk? Last time I did
this ( last month actually ) there were two disks faulted. Today there is
just one.

As usual I will NOT order a new replacement disk.

I just let that ZPool sort itself out. It will take an hour or so to sync
up that hot spare.

The machine in question is a production Solaris 10 server :

# uname -a
SunOS jupiter 5.10 Generic_138888-03 sun4u sparc SUNW,Sun-Fire-480R # cat
/etc/release
                       Solaris 10 5/08 s10s_u5wos_10 SPARC
           Copyright 2008 Sun Microsystems, Inc.  All Rights Reserved.
                        Use is subject to license terms.
                             Assembled 24 March 2008

The zpool in question looks like so :

# zpool list
NAME     SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
fibre0   680G   536G   144G    78%  DEGRADED  -
z0      40.2G   103K  40.2G     0%  ONLINE  -

# zpool status fibre0
  pool: fibre0
 state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
        Sufficient replicas exist for the pool to continue functioning in
a
        degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
        repaired.
 scrub: resilver completed after 1h35m with 0 errors on Tue Mar 24
03:04:49 2009
config:

        NAME           STATE     READ WRITE CKSUM
        fibre0         DEGRADED     0     0     0
          mirror       ONLINE       0     0     0
            c2t16d0    ONLINE       0     0     0
            c5t0d0     ONLINE       0     0     0
          mirror       DEGRADED     0     0     0
            c5t1d0     ONLINE       0     0     0
            spare      DEGRADED     0     0     0
              c2t17d0  FAULTED      0     0     0  too many errors
              c2t22d0  ONLINE       0     0     0
          mirror       ONLINE       0     0     0
            c5t2d0     ONLINE       0     0     0
            c2t18d0    ONLINE       0     0     0
          mirror       ONLINE       0     0     0
            c2t20d0    ONLINE       0     0     0
            c5t4d0     ONLINE       0     0     0
          mirror       ONLINE       0     0     0
            c2t21d0    ONLINE       0     0     0
            c5t6d0     ONLINE       0     0     0
        spares
          c2t22d0      INUSE     currently in use

errors: No known data errors

Is there *really* a severe fault in that disk ?

# luxadm -v display 21000018625d599d
  Displaying information for: 21000018625d599d
  Searching directory /dev/es for links to enclosures
DEVICE PROPERTIES for disk: 21000018625d599d
  Vendor:               HPQ
  Product ID:           BD1465822C
  Revision:             HP04
  Serial Num:           3KS36V5N000076218F5R
  Unformatted capacity: 140014.406 MBytes
  Write Cache:          Enabled
  Read Cache:           Enabled
    Minimum prefetch:   0x0
    Maximum prefetch:   0xffff
  Device Type:          Disk device
  Path(s):

  /dev/rdsk/c2t17d0s2
  /devices/p...@8,600000/SUNW,q...@1/f...@0,0/s...@w21000018625d599d,0:c,raw
    LUN path port WWN:          21000018625d599d
    Host controller port WWN:   210000e08b08f1a1
    Path status:                O.K.

What does the SMART Firmware say ?

# /root/bin/smartctl -a /dev/rdsk/c2t17d0s0
smartctl version 5.33 [sparc-sun-solaris2.8] Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

Device: HPQ      BD1465822C       Version: HP04
Serial number: 3KS36V5N000076218F5R
Device type: disk
Transport protocol: IEEE 1394 (SBP-2)
Local Time is: Tue Mar 24 14:09:07 2009 GMT
Device supports SMART and is Enabled
Temperature Warning Enabled
SMART Health Status: OK

Current Drive Temperature:     33 C
Drive Trip Temperature:        68 C
Vendor (Seagate) cache information
  Blocks sent to initiator = 615507364
  Blocks received from initiator = 3004562974
  Blocks read from cache and sent to initiator = 94569699
  Number of read and write commands whose size <= segment size = 185763910
Number of read and write commands whose size > segment size = 0

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes
  Total
               EEC          rereads/    errors   algorithm      processed
  uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9
bytes]  errors
read:    8952309        0         0   8952309    8952309        999.277
       0
write:         0        0         0         0         12       1328.105
       0
verify:   934290        0         0    934290     934290        146.816
       0

Non-medium error count:        1

Error Events logging not supported

SMART Self-test log
Num  Test              Status                 segment  LifeTime
LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background short  Completed                   -    31
  - [-   -    -]


It is hard to see but the total uncorrected errors is zero.


 ***********************************************
 * So let's just correct the "SEVERE" fault.
 ***********************************************

# zpool detach fibre0 c2t17d0
# zpool detach fibre0 c2t22d0
# zpool status fibre0
  pool: fibre0
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The pool can
        still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
        pool will no longer be accessible on older software versions.
 scrub: resilver completed after 1h35m with 0 errors on Tue Mar 24
03:04:49 2009
config:

        NAME         STATE     READ WRITE CKSUM
        fibre0       ONLINE       0     0     0
          mirror     ONLINE       0     0     0
            c2t16d0  ONLINE       0     0     0
            c5t0d0   ONLINE       0     0     0
          c5t1d0     ONLINE       0     0     0
          mirror     ONLINE       0     0     0
            c5t2d0   ONLINE       0     0     0
            c2t18d0  ONLINE       0     0     0
          mirror     ONLINE       0     0     0
            c2t20d0  ONLINE       0     0     0
            c5t4d0   ONLINE       0     0     0
          mirror     ONLINE       0     0     0
            c2t21d0  ONLINE       0     0     0
            c5t6d0   ONLINE       0     0     0

errors: No known data errors

# zpool attach fibre0 c5t1d0 c2t17d0
# zpool add fibre0 spare c2t22d0
# zpool status fibre0
  pool: fibre0
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress for 0h2m, 2.86% done, 1h18m to go
config:

        NAME         STATE     READ WRITE CKSUM
        fibre0       ONLINE       0     0     0
          mirror     ONLINE       0     0     0
            c2t16d0  ONLINE       0     0     0
            c5t0d0   ONLINE       0     0     0
          mirror     ONLINE       0     0     0
            c5t1d0   ONLINE       0     0     0
            c2t17d0  ONLINE       0     0     0
          mirror     ONLINE       0     0     0
            c5t2d0   ONLINE       0     0     0
            c2t18d0  ONLINE       0     0     0
          mirror     ONLINE       0     0     0
            c2t20d0  ONLINE       0     0     0
            c5t4d0   ONLINE       0     0     0
          mirror     ONLINE       0     0     0
            c2t21d0  ONLINE       0     0     0
            c5t6d0   ONLINE       0     0     0
        spares
          c2t22d0    AVAIL

errors: No known data errors
#

I have also learned that you can not trust that silver progress report
either. It will not take 1h18m to complete. If I wait 20 minutes I'll get
*nearly* the same estimate. The process must not be deterministic in
nature.

# zpool status
  pool: fibre0
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress for 0h39m, 34.24% done, 1h15m to go
config:

        NAME         STATE     READ WRITE CKSUM
        fibre0       ONLINE       0     0     0
          mirror     ONLINE       0     0     0
            c2t16d0  ONLINE       0     0     0
            c5t0d0   ONLINE       0     0     0
          mirror     ONLINE       0     0     0
            c5t1d0   ONLINE       0     0     0
            c2t17d0  ONLINE       0     0     0
          mirror     ONLINE       0     0     0
            c5t2d0   ONLINE       0     0     0
            c2t18d0  ONLINE       0     0     0
          mirror     ONLINE       0     0     0
            c2t20d0  ONLINE       0     0     0
            c5t4d0   ONLINE       0     0     0
          mirror     ONLINE       0     0     0
            c2t21d0  ONLINE       0     0     0
            c5t6d0   ONLINE       0     0     0
        spares
          c2t22d0    AVAIL

errors: No known data errors

  pool: z0
 state: ONLINE
 scrub: none requested
config:

        NAME          STATE     READ WRITE CKSUM
        z0            ONLINE       0     0     0
          mirror      ONLINE       0     0     0
            c1t0d0s7  ONLINE       0     0     0
            c1t1d0s7  ONLINE       0     0     0

errors: No known data errors

# fmadm faulty -afg
#

I do TOTALLY trust that last line that says "No known data errors" which
makes me wonder if the Severe FAULTs are for unknown data errors :-)


-- 
Dennis Clarke
sig du jour : "An appeaser is one who feeds a crocodile, hoping it will
eat him last.", Winston Churchill

[1] I really want to know where PowerChute for Solaris went to.

[2] I would create a ZPool of striped mirrors based on multiple USB keys
and on disks on IDE/SATA with or without compression and with
copies={1|2|3} and while running a ON compile I'd pull the USB keys out
and yank the power on the IDE/SATA or fibre disks. ZFS would not throw a
fatal error nor drop a bit of data. Performance suffered but data did not.



_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] ZFS Honesty after a power failure

Reply via email to