Re: [zfs-discuss] zpool spares listed twice, as both AVAIL and FAULTED

Ryan Schwartz Fri, 09 Jul 2010 11:08:47 -0700

Ok, so after removing the spares marked as AVAIL and re-adding them again, I 
put myself back in the "you're effed, dude" boat. What I should have done at 
that point is a zpool export/import at that point which would have resolved it.


So what I did was recreate the steps that got me into the state where the AVAIL 
spares were listed first, rather than the FAULTED ones (which allowed me to 
remove them as demonstrated in my previous email).

I created another pool sharing the same spares, removed the spares then 
destroyed it, then exported and imported the main pool again. Once that 
operation completed, I was then able to remove the spares again, export/import 
the pool, and the problem is now resolved.

zpool create cleanup c5t3d0 c4t3d0 spare c0t6d0 c5t5d0
zpool remove cleanup c0t6d0 c5t5d0
zpool destroy cleanup
zpool export idgsun02
zpool import idgsun02
zpool remove idgsun02 c0t6d0
zpool remove idgsun02 c5t5d0
zpool export idgsun02
zpool import idgsun02

And the resultant zpool status is this:

[IDGSUN02:/] root# zpool status 
  pool: idgsun02
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        idgsun02    ONLINE       0     0     0
          raidz2    ONLINE       0     0     0
            c0t1d0  ONLINE       0     0     0
            c0t5d0  ONLINE       0     0     0
            c1t1d0  ONLINE       0     0     0
            c1t5d0  ONLINE       0     0     0
            c6t1d0  ONLINE       0     0     0
            c6t5d0  ONLINE       0     0     0
            c7t1d0  ONLINE       0     0     0
            c7t5d0  ONLINE       0     0     0
            c4t1d0  ONLINE       0     0     0
            c4t5d0  ONLINE       0     0     0
          raidz2    ONLINE       0     0     0
            c0t0d0  ONLINE       0     0     0
            c0t4d0  ONLINE       0     0     0
            c1t0d0  ONLINE       0     0     0
            c1t4d0  ONLINE       0     0     0
            c6t0d0  ONLINE       0     0     0
            c6t4d0  ONLINE       0     0     0
            c7t0d0  ONLINE       0     0     0
            c7t4d0  ONLINE       0     0     0
            c4t0d0  ONLINE       0     0     0
            c4t4d0  ONLINE       0     0     0
        spares
          c0t6d0    AVAIL   
          c5t5d0    AVAIL   

errors: No known data errors

Hopefully this might help someone in the future if they get into this situation.
-- 
Ryan Schwartz, UNIX Systems Administrator, VitalSource Technologies, Inc. - An 
Ingram Digital Company
Mob: (608) 886-3513 ▪ ryan.schwa...@ingramdigital.com

On Jul 9, 2010, at 11:38 AM, Ryan Schwartz wrote:

> Hi Cindy,
> 
> Not sure exactly when the drives went into this state, but it is likely that 
> it happened when I added a second pool, added the same spares to the second 
> pool, then later destroyed the second pool. There have been no controller or 
> any other hardware changes to this system - it is all original parts. The 
> device names are valid, the issue is that they are listed twice - once for a 
> spare which is AVAIL and another time for the spare which is FAULTED.
> 
> I've tried zpool remove, zpool offline, zpool clear, zpool export/import, 
> I've unconfigured the drives via cfgadm and tried a remove, nothing works to 
> remove the FAULTED spares.
> 
> I was just able remove the AVAIL spares, but only since they were listed 
> first in the spares list:
> 
> [IDGSUN02:/dev/dsk] root# zpool remove idgsun02 c0t6d0  
> [IDGSUN02:/dev/dsk] root# zpool remove idgsun02 c5t5d0
> [IDGSUN02:/dev/dsk] root# zpool status
>  pool: idgsun02
> state: ONLINE
> scrub: none requested
> config:
> 
>        NAME        STATE     READ WRITE CKSUM
>        idgsun02    ONLINE       0     0     0
>          raidz2    ONLINE       0     0     0
>            c0t1d0  ONLINE       0     0     0
>            c0t5d0  ONLINE       0     0     0
>            c1t1d0  ONLINE       0     0     0
>            c1t5d0  ONLINE       0     0     0
>            c6t1d0  ONLINE       0     0     0
>            c6t5d0  ONLINE       0     0     0
>            c7t1d0  ONLINE       0     0     0
>            c7t5d0  ONLINE       0     0     0
>            c4t1d0  ONLINE       0     0     0
>            c4t5d0  ONLINE       0     0     0
>          raidz2    ONLINE       0     0     0
>            c0t0d0  ONLINE       0     0     0
>            c0t4d0  ONLINE       0     0     0
>            c1t0d0  ONLINE       0     0     0
>            c1t4d0  ONLINE       0     0     0
>            c6t0d0  ONLINE       0     0     0
>            c6t4d0  ONLINE       0     0     0
>            c7t0d0  ONLINE       0     0     0
>            c7t4d0  ONLINE       0     0     0
>            c4t0d0  ONLINE       0     0     0
>            c4t4d0  ONLINE       0     0     0
>        spares
>          c0t6d0    FAULTED   corrupted data
>          c5t5d0    FAULTED   corrupted data
> 
> errors: No known data errors
> 
> What's interesting is that running the zpool remove commands a second time 
> has no effect (presumably because zpool is using GUID internally).
> 
> I may have, at one point, tried to re-add the drive again after seeing the 
> state FAULTED and not being able to remove it, which is probably where the 
> second set of entries came from. (Pretty much exactly what's described here: 
> http://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSFaultedSpares).
> 
> What I really need is to be able to remove the two bogus faulted spares, and 
> I think the only way I'll be able to do that is via the GUIDs, since the 
> (valid) vdev path is shown as the same for each. I would guess zpool is 
> attempting to remove the device  I've got a support case open, but no 
> traction on that as of yet.
> -- 
> Ryan Schwartz, UNIX Systems Administrator, VitalSource Technologies, Inc. - 
> An Ingram Digital Company
> Mob: (608) 886-3513 ▪ ryan.schwa...@ingramdigital.com
> 
> On Jul 8, 2010, at 5:25 PM, Cindy Swearingen wrote:
> 
>> Hi Ryan,
>> 
>> What events lead up to this situation? I've seen a similar problem when a 
>> system upgrade caused the controller numbers of the spares to change. In 
>> that case, the workaround was to export the pool, correct the spare device 
>> names, and import the pool. I'm not sure if this workaround applies to your 
>> case. Do you know if the spare device names changed?
>> 
>> My hunch is that you could export this pool, reconnect the spare
>> devices, and reimport the pool, but I'd rather test this on my own pool 
>> first and I can't reproduce this problem.
>> 
>> I don't think you can remove the spares by their GUID. At least,
>> I couldn't.
>> 
>> You said you tried to remove the spares with zpool remove.
>> 
>> Did you try this command:
>> 
>> # zpool remove idgsun02 c0t6d0
>> 
>> Or this command, which I don't think would work, but you would
>> get a message like this:
>> 
>> # zpool remove idgsun02 c0t6d0s0
>> cannot remove c0t6d0s0: no such device in pool
>> 
>> Thanks,
>> 
>> Cindy

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zpool spares listed twice, as both AVAIL and FAULTED

Reply via email to