Ok, so after removing the spares marked as AVAIL and re-adding them again, I put myself back in the "you're effed, dude" boat. What I should have done at that point is a zpool export/import at that point which would have resolved it.
So what I did was recreate the steps that got me into the state where the AVAIL spares were listed first, rather than the FAULTED ones (which allowed me to remove them as demonstrated in my previous email). I created another pool sharing the same spares, removed the spares then destroyed it, then exported and imported the main pool again. Once that operation completed, I was then able to remove the spares again, export/import the pool, and the problem is now resolved. zpool create cleanup c5t3d0 c4t3d0 spare c0t6d0 c5t5d0 zpool remove cleanup c0t6d0 c5t5d0 zpool destroy cleanup zpool export idgsun02 zpool import idgsun02 zpool remove idgsun02 c0t6d0 zpool remove idgsun02 c5t5d0 zpool export idgsun02 zpool import idgsun02 And the resultant zpool status is this: [IDGSUN02:/] root# zpool status pool: idgsun02 state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM idgsun02 ONLINE 0 0 0 raidz2 ONLINE 0 0 0 c0t1d0 ONLINE 0 0 0 c0t5d0 ONLINE 0 0 0 c1t1d0 ONLINE 0 0 0 c1t5d0 ONLINE 0 0 0 c6t1d0 ONLINE 0 0 0 c6t5d0 ONLINE 0 0 0 c7t1d0 ONLINE 0 0 0 c7t5d0 ONLINE 0 0 0 c4t1d0 ONLINE 0 0 0 c4t5d0 ONLINE 0 0 0 raidz2 ONLINE 0 0 0 c0t0d0 ONLINE 0 0 0 c0t4d0 ONLINE 0 0 0 c1t0d0 ONLINE 0 0 0 c1t4d0 ONLINE 0 0 0 c6t0d0 ONLINE 0 0 0 c6t4d0 ONLINE 0 0 0 c7t0d0 ONLINE 0 0 0 c7t4d0 ONLINE 0 0 0 c4t0d0 ONLINE 0 0 0 c4t4d0 ONLINE 0 0 0 spares c0t6d0 AVAIL c5t5d0 AVAIL errors: No known data errors Hopefully this might help someone in the future if they get into this situation. -- Ryan Schwartz, UNIX Systems Administrator, VitalSource Technologies, Inc. - An Ingram Digital Company Mob: (608) 886-3513 ▪ ryan.schwa...@ingramdigital.com On Jul 9, 2010, at 11:38 AM, Ryan Schwartz wrote: > Hi Cindy, > > Not sure exactly when the drives went into this state, but it is likely that > it happened when I added a second pool, added the same spares to the second > pool, then later destroyed the second pool. There have been no controller or > any other hardware changes to this system - it is all original parts. The > device names are valid, the issue is that they are listed twice - once for a > spare which is AVAIL and another time for the spare which is FAULTED. > > I've tried zpool remove, zpool offline, zpool clear, zpool export/import, > I've unconfigured the drives via cfgadm and tried a remove, nothing works to > remove the FAULTED spares. > > I was just able remove the AVAIL spares, but only since they were listed > first in the spares list: > > [IDGSUN02:/dev/dsk] root# zpool remove idgsun02 c0t6d0 > [IDGSUN02:/dev/dsk] root# zpool remove idgsun02 c5t5d0 > [IDGSUN02:/dev/dsk] root# zpool status > pool: idgsun02 > state: ONLINE > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > idgsun02 ONLINE 0 0 0 > raidz2 ONLINE 0 0 0 > c0t1d0 ONLINE 0 0 0 > c0t5d0 ONLINE 0 0 0 > c1t1d0 ONLINE 0 0 0 > c1t5d0 ONLINE 0 0 0 > c6t1d0 ONLINE 0 0 0 > c6t5d0 ONLINE 0 0 0 > c7t1d0 ONLINE 0 0 0 > c7t5d0 ONLINE 0 0 0 > c4t1d0 ONLINE 0 0 0 > c4t5d0 ONLINE 0 0 0 > raidz2 ONLINE 0 0 0 > c0t0d0 ONLINE 0 0 0 > c0t4d0 ONLINE 0 0 0 > c1t0d0 ONLINE 0 0 0 > c1t4d0 ONLINE 0 0 0 > c6t0d0 ONLINE 0 0 0 > c6t4d0 ONLINE 0 0 0 > c7t0d0 ONLINE 0 0 0 > c7t4d0 ONLINE 0 0 0 > c4t0d0 ONLINE 0 0 0 > c4t4d0 ONLINE 0 0 0 > spares > c0t6d0 FAULTED corrupted data > c5t5d0 FAULTED corrupted data > > errors: No known data errors > > What's interesting is that running the zpool remove commands a second time > has no effect (presumably because zpool is using GUID internally). > > I may have, at one point, tried to re-add the drive again after seeing the > state FAULTED and not being able to remove it, which is probably where the > second set of entries came from. (Pretty much exactly what's described here: > http://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSFaultedSpares). > > What I really need is to be able to remove the two bogus faulted spares, and > I think the only way I'll be able to do that is via the GUIDs, since the > (valid) vdev path is shown as the same for each. I would guess zpool is > attempting to remove the device I've got a support case open, but no > traction on that as of yet. > -- > Ryan Schwartz, UNIX Systems Administrator, VitalSource Technologies, Inc. - > An Ingram Digital Company > Mob: (608) 886-3513 ▪ ryan.schwa...@ingramdigital.com > > On Jul 8, 2010, at 5:25 PM, Cindy Swearingen wrote: > >> Hi Ryan, >> >> What events lead up to this situation? I've seen a similar problem when a >> system upgrade caused the controller numbers of the spares to change. In >> that case, the workaround was to export the pool, correct the spare device >> names, and import the pool. I'm not sure if this workaround applies to your >> case. Do you know if the spare device names changed? >> >> My hunch is that you could export this pool, reconnect the spare >> devices, and reimport the pool, but I'd rather test this on my own pool >> first and I can't reproduce this problem. >> >> I don't think you can remove the spares by their GUID. At least, >> I couldn't. >> >> You said you tried to remove the spares with zpool remove. >> >> Did you try this command: >> >> # zpool remove idgsun02 c0t6d0 >> >> Or this command, which I don't think would work, but you would >> get a message like this: >> >> # zpool remove idgsun02 c0t6d0s0 >> cannot remove c0t6d0s0: no such device in pool >> >> Thanks, >> >> Cindy _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss