I'm happy to see that someone else brought up this topic. I had a nasty long power failure last night that drained the APC/UPS batteries dry.[1] :-(
I changed the subject line somewhat because I feel that the issue is one of honesty as opposed to reliability. I *feel* that ZFS is reliable out past six nines ( rho=0.999999 ) flawless for two reasons; I have never seen it fail me and I have pounded it with some fairly offensive abuse under terrible conditions[2], and secondly because everyone in the computer industry is trying to steal^H^H^H^H^Himplement it into their OS of choice. There must be a reason for that. However, I have repeatedly run into problems when I need to boot after a power failure. I see vdevs being marked as FAULTED regardless if there are actually any hard errors reported by the on disk SMART Firmware. I am able to remove these FAULTed devices temporarily and then re-insert the same disk again and then run fine for months. Until the next long power failure. This is where "honestly" becomes a question because I have to question the severity of the FAULT when I know from past experience that the disk(s) in question can be removed and then re-inserted and life is fine for months. Were harddisk manufacturers involved in this error message logic? :-P A power failure, a really nice long one, happened last night and again when I boot up I see nasty error messages. Here is *precisely* what I saw last night : {3} ok boot -s Resetting ... Sun Fire 480R, No Keyboard Copyright 2007 Sun Microsystems, Inc. All rights reserved. OpenBoot 4.22.34, 16384 MB memory installed, Serial #53264354. Ethernet address 0:3:ba:2c:bf:e2, Host ID: 832cbfe2. Rebooting with command: boot -s Boot device: /p...@9,600000/SUNW,q...@2/f...@0,0/d...@w21000004cfb6f0ff,0:a File and args: -s SunOS Release 5.10 Version Generic_138888-03 64-bit Copyright 1983-2008 Sun Microsystems, Inc. All rights reserved. Use is subject to license terms. Booting to milestone "milestone/single-user:default". Hostname: jupiter Requesting System Maintenance Mode SINGLE USER MODE Root password for system maintenance (control-d to bypass): single-user privilege assigned to /dev/console. Entering System Maintenance Mode Mar 24 01:28:04 su: 'su root' succeeded for root on /dev/console Sun Microsystems Inc. SunOS 5.10 Generic January 2005 # /***************************************************/ /* the very first thing I check is zpool fibre0 */ /***************************************************/ # zpool status fibre0 pool: fibre0 state: ONLINE status: The pool is formatted using an older on-disk format. The pool can still be used, but some features are unavailable. action: Upgrade the pool using 'zpool upgrade'. Once this is done, the pool will no longer be accessible on older software versions. scrub: none requested config: NAME STATE READ WRITE CKSUM fibre0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t16d0 ONLINE 0 0 0 c5t0d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c5t1d0 ONLINE 0 0 0 c2t17d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c5t2d0 ONLINE 0 0 0 c2t18d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t20d0 ONLINE 0 0 0 c5t4d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t21d0 ONLINE 0 0 0 c5t6d0 ONLINE 0 0 0 spares c2t22d0 AVAIL errors: No known data errors ************************************************* * everything looks fine, okay, thank you to ZFS * * ... and then I try to boot to full init 3 * ************************************************* # exit svc.startd: Returning to milestone all. Reading ZFS config: done. Mounting ZFS filesystems: (1/51) jupiter console l(51/51) root Password: Last login: Sat Mar 7 19:39:00 on console Sun Microsystems Inc. SunOS 5.10 Generic January 2005 # zpool status fibre0 pool: fibre0 state: ONLINE status: The pool is formatted using an older on-disk format. The pool can still be used, but some features are unavailable. action: Upgrade the pool using 'zpool upgrade'. Once this is done, the pool will no longer be accessible on older software versions. scrub: none requested config: NAME STATE READ WRITE CKSUM fibre0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t16d0 ONLINE 0 0 0 c5t0d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c5t1d0 ONLINE 0 0 0 c2t17d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c5t2d0 ONLINE 0 0 0 c2t18d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t20d0 ONLINE 0 0 0 c5t4d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t21d0 ONLINE 0 0 0 c5t6d0 ONLINE 0 0 0 spares c2t22d0 AVAIL errors: No known data errors * everything STILL looks fine, and only seconds have passed. * Then .. I get bombarded with SEVERITY: Major faults # SUNW-MSG-ID: ZFS-8000-FD, TYPE: Fault, VER: 1, SEVERITY: Major EVENT-TIME: Tue Mar 24 01:29:00 GMT 2009 PLATFORM: SUNW,Sun-Fire-480R, CSN: -, HOSTNAME: jupiter SOURCE: zfs-diagnosis, REV: 1.0 EVENT-ID: 3780a2dd-7381-c053-e186-8112b463c2b7 DESC: The number of I/O errors associated with a ZFS device exceeded acceptable levels. Refer to http://sun.com/msg/ZFS-8000-FD for more information. AUTO-RESPONSE: The device has been offlined and marked as faulted. An attempt will be made to activate a hot spare if available. IMPACT: Fault tolerance of the pool may be compromised. REC-ACTION: Run 'zpool status -x' and replace the bad device. SUNW-MSG-ID: ZFS-8000-FD, TYPE: Fault, VER: 1, SEVERITY: Major EVENT-TIME: Tue Mar 24 01:29:00 GMT 2009 PLATFORM: SUNW,Sun-Fire-480R, CSN: -, HOSTNAME: jupiter SOURCE: zfs-diagnosis, REV: 1.0 EVENT-ID: 146dad1d-f195-c2d6-c630-c1adcd58b288 DESC: The number of I/O errors associated with a ZFS device exceeded acceptable levels. Refer to http://sun.com/msg/ZFS-8000-FD for more information. AUTO-RESPONSE: The device has been offlined and marked as faulted. An attempt will be made to activate a hot spare if available. IMPACT: Fault tolerance of the pool may be compromised. REC-ACTION: Run 'zpool status -x' and replace the bad device. ******************************************************** I know that I have been here before after a power failure with similar messages. They were not entirely honest about the SEVERITY of the device faults. The faults are certainly not "Major faults" ********************************************************* # zpool status fibre0 pool: fibre0 state: DEGRADED status: One or more devices are faulted in response to persistent errors. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the faulted device, or use 'zpool clear' to mark the device repaired. scrub: resilver in progress for 0h0m, 0.02% done, 21h7m to go config: NAME STATE READ WRITE CKSUM fibre0 DEGRADED 0 0 0 mirror ONLINE 0 0 0 c2t16d0 ONLINE 0 0 0 c5t0d0 ONLINE 0 0 0 mirror DEGRADED 0 0 0 c5t1d0 ONLINE 0 0 0 spare DEGRADED 0 0 0 c2t17d0 FAULTED 0 0 0 too many errors c2t22d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c5t2d0 ONLINE 0 0 0 c2t18d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t20d0 ONLINE 0 0 0 c5t4d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t21d0 ONLINE 0 0 0 c5t6d0 ONLINE 0 0 0 spares c2t22d0 INUSE currently in use errors: No known data errors # Mar 24 01:29:53 jupiter ntpdate[733]: no server suitable for synchronization found *********************************************************** * at this point I go look at my cisco routers and check my AC * and get things booting, I also curse my new APC gear for not * signalings a power failure ... but that is another story. *********************************************************** So can I *trust* what I am seeing? Do I really believe that I have a SEVERE fault in a disk? Last time I did this ( last month actually ) there were two disks faulted. Today there is just one. As usual I will NOT order a new replacement disk. I just let that ZPool sort itself out. It will take an hour or so to sync up that hot spare. The machine in question is a production Solaris 10 server : # uname -a SunOS jupiter 5.10 Generic_138888-03 sun4u sparc SUNW,Sun-Fire-480R # cat /etc/release Solaris 10 5/08 s10s_u5wos_10 SPARC Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Use is subject to license terms. Assembled 24 March 2008 The zpool in question looks like so : # zpool list NAME SIZE USED AVAIL CAP HEALTH ALTROOT fibre0 680G 536G 144G 78% DEGRADED - z0 40.2G 103K 40.2G 0% ONLINE - # zpool status fibre0 pool: fibre0 state: DEGRADED status: One or more devices are faulted in response to persistent errors. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the faulted device, or use 'zpool clear' to mark the device repaired. scrub: resilver completed after 1h35m with 0 errors on Tue Mar 24 03:04:49 2009 config: NAME STATE READ WRITE CKSUM fibre0 DEGRADED 0 0 0 mirror ONLINE 0 0 0 c2t16d0 ONLINE 0 0 0 c5t0d0 ONLINE 0 0 0 mirror DEGRADED 0 0 0 c5t1d0 ONLINE 0 0 0 spare DEGRADED 0 0 0 c2t17d0 FAULTED 0 0 0 too many errors c2t22d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c5t2d0 ONLINE 0 0 0 c2t18d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t20d0 ONLINE 0 0 0 c5t4d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t21d0 ONLINE 0 0 0 c5t6d0 ONLINE 0 0 0 spares c2t22d0 INUSE currently in use errors: No known data errors Is there *really* a severe fault in that disk ? # luxadm -v display 21000018625d599d Displaying information for: 21000018625d599d Searching directory /dev/es for links to enclosures DEVICE PROPERTIES for disk: 21000018625d599d Vendor: HPQ Product ID: BD1465822C Revision: HP04 Serial Num: 3KS36V5N000076218F5R Unformatted capacity: 140014.406 MBytes Write Cache: Enabled Read Cache: Enabled Minimum prefetch: 0x0 Maximum prefetch: 0xffff Device Type: Disk device Path(s): /dev/rdsk/c2t17d0s2 /devices/p...@8,600000/SUNW,q...@1/f...@0,0/s...@w21000018625d599d,0:c,raw LUN path port WWN: 21000018625d599d Host controller port WWN: 210000e08b08f1a1 Path status: O.K. What does the SMART Firmware say ? # /root/bin/smartctl -a /dev/rdsk/c2t17d0s0 smartctl version 5.33 [sparc-sun-solaris2.8] Copyright (C) 2002-4 Bruce Allen Home page is http://smartmontools.sourceforge.net/ Device: HPQ BD1465822C Version: HP04 Serial number: 3KS36V5N000076218F5R Device type: disk Transport protocol: IEEE 1394 (SBP-2) Local Time is: Tue Mar 24 14:09:07 2009 GMT Device supports SMART and is Enabled Temperature Warning Enabled SMART Health Status: OK Current Drive Temperature: 33 C Drive Trip Temperature: 68 C Vendor (Seagate) cache information Blocks sent to initiator = 615507364 Blocks received from initiator = 3004562974 Blocks read from cache and sent to initiator = 94569699 Number of read and write commands whose size <= segment size = 185763910 Number of read and write commands whose size > segment size = 0 Error counter log: Errors Corrected by Total Correction Gigabytes Total EEC rereads/ errors algorithm processed uncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errors read: 8952309 0 0 8952309 8952309 999.277 0 write: 0 0 0 0 12 1328.105 0 verify: 934290 0 0 934290 934290 146.816 0 Non-medium error count: 1 Error Events logging not supported SMART Self-test log Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ] Description number (hours) # 1 Background short Completed - 31 - [- - -] It is hard to see but the total uncorrected errors is zero. *********************************************** * So let's just correct the "SEVERE" fault. *********************************************** # zpool detach fibre0 c2t17d0 # zpool detach fibre0 c2t22d0 # zpool status fibre0 pool: fibre0 state: ONLINE status: The pool is formatted using an older on-disk format. The pool can still be used, but some features are unavailable. action: Upgrade the pool using 'zpool upgrade'. Once this is done, the pool will no longer be accessible on older software versions. scrub: resilver completed after 1h35m with 0 errors on Tue Mar 24 03:04:49 2009 config: NAME STATE READ WRITE CKSUM fibre0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t16d0 ONLINE 0 0 0 c5t0d0 ONLINE 0 0 0 c5t1d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c5t2d0 ONLINE 0 0 0 c2t18d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t20d0 ONLINE 0 0 0 c5t4d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t21d0 ONLINE 0 0 0 c5t6d0 ONLINE 0 0 0 errors: No known data errors # zpool attach fibre0 c5t1d0 c2t17d0 # zpool add fibre0 spare c2t22d0 # zpool status fibre0 pool: fibre0 state: ONLINE status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress for 0h2m, 2.86% done, 1h18m to go config: NAME STATE READ WRITE CKSUM fibre0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t16d0 ONLINE 0 0 0 c5t0d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c5t1d0 ONLINE 0 0 0 c2t17d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c5t2d0 ONLINE 0 0 0 c2t18d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t20d0 ONLINE 0 0 0 c5t4d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t21d0 ONLINE 0 0 0 c5t6d0 ONLINE 0 0 0 spares c2t22d0 AVAIL errors: No known data errors # I have also learned that you can not trust that silver progress report either. It will not take 1h18m to complete. If I wait 20 minutes I'll get *nearly* the same estimate. The process must not be deterministic in nature. # zpool status pool: fibre0 state: ONLINE status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress for 0h39m, 34.24% done, 1h15m to go config: NAME STATE READ WRITE CKSUM fibre0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t16d0 ONLINE 0 0 0 c5t0d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c5t1d0 ONLINE 0 0 0 c2t17d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c5t2d0 ONLINE 0 0 0 c2t18d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t20d0 ONLINE 0 0 0 c5t4d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t21d0 ONLINE 0 0 0 c5t6d0 ONLINE 0 0 0 spares c2t22d0 AVAIL errors: No known data errors pool: z0 state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM z0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t0d0s7 ONLINE 0 0 0 c1t1d0s7 ONLINE 0 0 0 errors: No known data errors # fmadm faulty -afg # I do TOTALLY trust that last line that says "No known data errors" which makes me wonder if the Severe FAULTs are for unknown data errors :-) -- Dennis Clarke sig du jour : "An appeaser is one who feeds a crocodile, hoping it will eat him last.", Winston Churchill [1] I really want to know where PowerChute for Solaris went to. [2] I would create a ZPool of striped mirrors based on multiple USB keys and on disks on IDE/SATA with or without compression and with copies={1|2|3} and while running a ON compile I'd pull the USB keys out and yank the power on the IDE/SATA or fibre disks. ZFS would not throw a fatal error nor drop a bit of data. Performance suffered but data did not. _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss