Re: [zfs-discuss] ZFS disk failure question

Eric Schrock Wed, 14 Oct 2009 15:01:36 -0700

On 10/14/09 14:33, Cindy Swearingen wrote:

Hi Eric,


I tried that and found that I needed to detach and remove
the spare before replacing the failed disk with the spare
disk.

You should just be able to detach 'c0t6d0' in the config below. Thespare (c0t7d0) will assume its place and be removed from the idle sparelist, becoming a "normal" vdev in the process.


- Eric

What actually worked is below.

Thanks,

Cindy

# zpool status test
  pool: test
 state: DEGRADED
status: One or more devices could not be opened. Sufficient replicasexist for
        the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-2Q
scrub: resilver completed after 0h0m with 0 errors on Wed Oct 1414:24:57 2009
config:

        NAME          STATE     READ WRITE CKSUM
        test          DEGRADED     0     0     0
          raidz1-0    DEGRADED     0     0     0
            c0t4d0    ONLINE       0     0     0
            c0t5d0    ONLINE       0     0     0
            spare-2   DEGRADED     0     0    19
              c0t6d0  UNAVAIL      0     0     0  cannot open
              c0t7d0  ONLINE       0     0     0  32K resilvered
        spares
          c0t7d0      INUSE     currently in use

errors: No known data errors
# zpool detach test c0t7d0
# zpool remove test c0t7d0
# zpool replace test c0t6d0 c0t7d0
# zpool status test
  pool: test
 state: ONLINE
scrub: resilver completed after 0h0m with 0 errors on Wed Oct 1414:25:47 2009
config:

        NAME        STATE     READ WRITE CKSUM
        test        ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            c0t4d0  ONLINE       0     0     0
            c0t5d0  ONLINE       0     0     0
            c0t7d0  ONLINE       0     0     0  48.5K resilvered

errors: No known data errors


On 10/14/09 15:23, Eric Schrock wrote:
On 10/14/09 14:17, Cindy Swearingen wrote:
Hi Jason,

I think you are asking how do you tell ZFS that you want to replace the
failed disk c8t7d0 with the spare, c8t11d0?

I just tried do this on my Nevada build 124 lab system, simulating a
disk failure and using zpool replace to replace the failed disk with
the spare. The spare is now busy and it fails. This has to be a bug.
You need to 'zpool detach' the original (c8t7d0).

- Eric
Another way to recover is if you have a replacement disk for c8t7d0,
like this:

1. Physically replace c8t7d0.

You might have to unconfigure the disk first. It depends
on the hardware.

2. Tell ZFS that you replaced it.

# zpool replace tank c8t7d0

3. Detach the spare.

# zpool detach tank c8t11d0

4. Clear the pool or the device specifically.

# zpool clear tank c8t7d0

Cindy

On 10/14/09 14:44, Jason Frank wrote:
So, my Areca controller has been complaining via email of readerrors for a couple days on SATA channel 8. The disk finally gaveup last night at 17:40. I got to say I really appreciate the Arecacontroller taking such good care of me.
For some reason, I wasn't able to log into the server last night orin the morning, probably because my home dir was on the zpool withthe failed disk (although it's a raidz2, so I don't know why thatwas a problem.) So, I went ahead and rebooted it the hard way thismorning.
The reboot went OK, and I was able to get access to my homedirectory by waiting about 5 minutes after authenticating. Ichecked my zpool, and it was resilvering. But, it had only beenrunning for a few minutes. Evidently, it didn't start resilveringuntil I rebooted it. I would have expected it to do that when thedisk failed last night (I had set up a hot spare disk already).
All of the zpool commands were taking minutes to complete whilec8t7d0 was UNAVAIL, so I offline'd it. When I say all, thatincludes iostat, status, upgrade, just about anythingnon-destructive that I could try. That was a little odd. Once Iofflined the drive, my resilver restarted, which surprised me.After all, I simply changed an UNAVAIL drive to OFFLINE, in eithercase, you can't use it for operations. But no big deal there. Thatfixed the login slowness and the zpool command slowness.
The resilver completed, and now I'm left with the following zpoolconfig. I'm not sure how to get things back to normal though, and Ihate to do something stupid...
r...@datasrv1:~# zpool status tank
  pool: tank
 state: DEGRADED
scrub: scrub stopped after 0h10m with 0 errors on Wed Oct 1415:23:06 2009
config:

        NAME           STATE     READ WRITE CKSUM
        tank           DEGRADED     0     0     0
          raidz2       DEGRADED     0     0     0
            c8t0d0     ONLINE       0     0     0
            c8t1d0     ONLINE       0     0     0
            c8t2d0     ONLINE       0     0     0
            c8t3d0     ONLINE       0     0     0
            c8t4d0     ONLINE       0     0     0
            c8t5d0     ONLINE       0     0     0
            c8t6d0     ONLINE       0     0     0
            spare      DEGRADED     0     0     0
              c8t7d0   REMOVED      0     0     0
              c8t11d0  ONLINE       0     0     0
            c8t8d0     ONLINE       0     0     0
            c8t9d0     ONLINE       0     0     0
            c8t10d0    ONLINE       0     0     0
        spares
          c8t11d0      INUSE     currently in use
Since it's not obvious, the spare line had both t7 and t11 indentedunder it.
When the resilver completed, I yanked the hard drive on target 7.
I'm assuming that t11 has the same content as t7, but that's notnecessarily clear from the output above.
So, now I'm left with the following config. I can't zfs remove t7,because it's not a hot spare or a cache disk. I can't zfs replacet7 with t11, I'm told that t11 is busy. And I didn't see any otherzpool subcommands that look likely to fix the problem.
Here are my system details:
SunOS datasrv1 5.11 snv_118 i86pc i386 i86xpv Solaris

This system is currently running ZFS pool version 16.

Pool 'tank' is already formatted using the current version.
How do I tell the system that t11 is the replacement for t7, and howto I then add t7 as the hot spare (after I replace the disk)?
Thanks
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



--
Eric Schrock, Fishworks                    http://blogs.sun.com/eschrock
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS disk failure question

Reply via email to