Thank you, that did the trick. That's not terribly obvious from the man page though. The man page says it detaches the devices from a mirror, and I had a raidz2. Since I'm messing with production data, I decided I wasn't going to chance it when I was reading the man page. You might consider changing the man page, and explaining a little more what it means, maybe even what the circumstances look like where you might use it.
Actually, an official and easily searchable "What to do when you have a zfs disk failure" with lots of examples would be great. There are a lot of attempts out there, but nothing I've found is comprehensive. Jason On Wed, Oct 14, 2009 at 4:23 PM, Eric Schrock <eric.schr...@sun.com> wrote: > On 10/14/09 14:17, Cindy Swearingen wrote: >> >> Hi Jason, >> >> I think you are asking how do you tell ZFS that you want to replace the >> failed disk c8t7d0 with the spare, c8t11d0? >> >> I just tried do this on my Nevada build 124 lab system, simulating a >> disk failure and using zpool replace to replace the failed disk with >> the spare. The spare is now busy and it fails. This has to be a bug. > > You need to 'zpool detach' the original (c8t7d0). > > - Eric > >> >> Another way to recover is if you have a replacement disk for c8t7d0, >> like this: >> >> 1. Physically replace c8t7d0. >> >> You might have to unconfigure the disk first. It depends >> on the hardware. >> >> 2. Tell ZFS that you replaced it. >> >> # zpool replace tank c8t7d0 >> >> 3. Detach the spare. >> >> # zpool detach tank c8t11d0 >> >> 4. Clear the pool or the device specifically. >> >> # zpool clear tank c8t7d0 >> >> Cindy >> >> On 10/14/09 14:44, Jason Frank wrote: >>> >>> So, my Areca controller has been complaining via email of read errors for >>> a couple days on SATA channel 8. The disk finally gave up last night at >>> 17:40. I got to say I really appreciate the Areca controller taking such >>> good care of me. >>> >>> For some reason, I wasn't able to log into the server last night or in >>> the morning, probably because my home dir was on the zpool with the failed >>> disk (although it's a raidz2, so I don't know why that was a problem.) So, >>> I went ahead and rebooted it the hard way this morning. >>> >>> The reboot went OK, and I was able to get access to my home directory by >>> waiting about 5 minutes after authenticating. I checked my zpool, and it >>> was resilvering. But, it had only been running for a few minutes. >>> Evidently, it didn't start resilvering until I rebooted it. I would have >>> expected it to do that when the disk failed last night (I had set up a hot >>> spare disk already). >>> >>> All of the zpool commands were taking minutes to complete while c8t7d0 >>> was UNAVAIL, so I offline'd it. When I say all, that includes iostat, >>> status, upgrade, just about anything non-destructive that I could try. That >>> was a little odd. Once I offlined the drive, my resilver restarted, which >>> surprised me. After all, I simply changed an UNAVAIL drive to OFFLINE, in >>> either case, you can't use it for operations. But no big deal there. That >>> fixed the login slowness and the zpool command slowness. >>> >>> The resilver completed, and now I'm left with the following zpool config. >>> I'm not sure how to get things back to normal though, and I hate to do >>> something stupid... >>> >>> r...@datasrv1:~# zpool status tank >>> pool: tank >>> state: DEGRADED >>> scrub: scrub stopped after 0h10m with 0 errors on Wed Oct 14 15:23:06 >>> 2009 >>> config: >>> >>> NAME STATE READ WRITE CKSUM >>> tank DEGRADED 0 0 0 >>> raidz2 DEGRADED 0 0 0 >>> c8t0d0 ONLINE 0 0 0 >>> c8t1d0 ONLINE 0 0 0 >>> c8t2d0 ONLINE 0 0 0 >>> c8t3d0 ONLINE 0 0 0 >>> c8t4d0 ONLINE 0 0 0 >>> c8t5d0 ONLINE 0 0 0 >>> c8t6d0 ONLINE 0 0 0 >>> spare DEGRADED 0 0 0 >>> c8t7d0 REMOVED 0 0 0 >>> c8t11d0 ONLINE 0 0 0 >>> c8t8d0 ONLINE 0 0 0 >>> c8t9d0 ONLINE 0 0 0 >>> c8t10d0 ONLINE 0 0 0 >>> spares >>> c8t11d0 INUSE currently in use >>> >>> Since it's not obvious, the spare line had both t7 and t11 indented under >>> it. >>> When the resilver completed, I yanked the hard drive on target 7. >>> >>> I'm assuming that t11 has the same content as t7, but that's not >>> necessarily clear from the output above. >>> >>> So, now I'm left with the following config. I can't zfs remove t7, >>> because it's not a hot spare or a cache disk. I can't zfs replace t7 with >>> t11, I'm told that t11 is busy. And I didn't see any other zpool >>> subcommands that look likely to fix the problem. >>> >>> Here are my system details: >>> SunOS datasrv1 5.11 snv_118 i86pc i386 i86xpv Solaris >>> >>> This system is currently running ZFS pool version 16. >>> >>> Pool 'tank' is already formatted using the current version. >>> >>> How do I tell the system that t11 is the replacement for t7, and how to I >>> then add t7 as the hot spare (after I replace the disk)? >>> >>> Thanks >> >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss@opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > > -- > Eric Schrock, Fishworks http://blogs.sun.com/eschrock > _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss