On Thu, Sep 21, 2006 at 04:25:44AM -0700, Liam McBrien wrote: > Hi there, > > Not sure if this is a known bug (or even if it's a bug at all), but > zfs seems to get confused when several consecutive temporary disk > faults occur involving a hot spare. I couldn't find anything related > to this on this forum, so here goes: > > I'm testing this on a SunBlade 2000 hooked up to a T3 via STMS. The OS > version is snv48. > > This is a bit confusing, so bear with me. Basically, the problem occurs when > the following happens: > > - a pool is created with a hot spare > - a data disk is faulted (so that the spare steps in) > - the data disk is brought back online > - the hot spare is faulted > - the hot spare is brought back online and detached from the pool (to > stop it from acting as a spare for the data disc that faulted) - the > original data disc is faulted again > > When the above takes place, the spare ends up replacing the data disc > completely in the pool but it still shows up as a spare. This occurs > with mirror, raidz1 and raidz2 volumes.
Yes, this sounds like a variation of a known bug that's on my queue to look at. Basically, the way we determine if something is a spare or not is rather broken, and you can confuse ZFS to the point of doing the wrong thing. I'll take a specific look at this case and see if it's the same underlying root cause. > On another note, when a disk is faulted the console output says > "AUTO-RESPONSE: No automated response will occur." - shouldn't this > mention that a hot spare action will happen? Yep. I'll take care of this when I do the next phase of ZFS/FMA integration. - Eri -- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss