[zfs-discuss] ZFS disk failure question

Jason Frank Wed, 14 Oct 2009 13:45:31 -0700

So, my Areca controller has been complaining via email of read errors for a 
couple days on SATA channel 8.  The disk finally gave up last night at 17:40.  
I got to say I really appreciate the Areca controller taking such good care of 
me.


For some reason, I wasn't able to log into the server last night or in the 
morning, probably because my home dir was on the zpool with the failed disk 
(although it's a raidz2, so I don't know why that was a problem.)  So, I went 
ahead and rebooted it the hard way this morning.

The reboot went OK, and I was able to get access to my home directory by 
waiting about 5 minutes after authenticating.  I checked my zpool, and it was 
resilvering.  But, it had only been running for a few minutes.  Evidently, it 
didn't start resilvering until I rebooted it.  I would have expected it to do 
that when the disk failed last night (I had set up a hot spare disk already).

All of the zpool commands were taking minutes to complete while c8t7d0 was 
UNAVAIL, so I offline'd it.  When I say all, that includes iostat, status, 
upgrade, just about anything non-destructive that I could try.  That was a 
little odd.  Once I offlined the drive, my resilver restarted, which surprised 
me.  After all, I simply changed an UNAVAIL drive to OFFLINE, in either case, 
you can't use it for operations.  But no big deal there.  That fixed the login 
slowness and the zpool command slowness.

The resilver completed, and now I'm left with the following zpool config.  I'm 
not sure how to get things back to normal though, and I hate to do something 
stupid...

r...@datasrv1:~# zpool status tank
  pool: tank
 state: DEGRADED
 scrub: scrub stopped after 0h10m with 0 errors on Wed Oct 14 15:23:06 2009
config:

        NAME           STATE     READ WRITE CKSUM
        tank           DEGRADED     0     0     0
          raidz2       DEGRADED     0     0     0
            c8t0d0     ONLINE       0     0     0
            c8t1d0     ONLINE       0     0     0
            c8t2d0     ONLINE       0     0     0
            c8t3d0     ONLINE       0     0     0
            c8t4d0     ONLINE       0     0     0
            c8t5d0     ONLINE       0     0     0
            c8t6d0     ONLINE       0     0     0
            spare      DEGRADED     0     0     0
              c8t7d0   REMOVED      0     0     0
              c8t11d0  ONLINE       0     0     0
            c8t8d0     ONLINE       0     0     0
            c8t9d0     ONLINE       0     0     0
            c8t10d0    ONLINE       0     0     0
        spares
          c8t11d0      INUSE     currently in use

Since it's not obvious, the spare line had both t7 and t11 indented under it. 

When the resilver completed, I yanked the hard drive on target 7.

I'm assuming that t11 has the same content as t7, but that's not necessarily 
clear from the output above.

So, now I'm left with the following config.  I can't zfs remove t7, because 
it's not a hot spare or a cache disk.  I can't zfs replace t7 with t11, I'm 
told that t11 is busy.  And I didn't see any other zpool subcommands that look 
likely to fix the problem.

Here are my system details:
SunOS datasrv1 5.11 snv_118 i86pc i386 i86xpv Solaris

This system is currently running ZFS pool version 16.

Pool 'tank' is already formatted using the current version.

How do I tell the system that t11 is the replacement for t7, and how to I then 
add t7 as the hot spare (after I replace the disk)?

Thanks
-- 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] ZFS disk failure question

Reply via email to