On Thu, Sep 21, 2006 at 04:25:44AM -0700, Liam McBrien wrote:
> Hi there,
> 
> Not sure if this is a known bug (or even if it's a bug at all), but
> zfs seems to get confused when several consecutive temporary disk
> faults occur involving a hot spare. I couldn't find anything related
> to this on this forum, so here goes:
> 
> I'm testing this on a SunBlade 2000 hooked up to a T3 via STMS. The OS 
> version is snv48.
> 
> This is a bit confusing, so bear with me. Basically, the problem occurs when 
> the following happens:
> 
> - a pool is created with a hot spare
> - a data disk is faulted (so that the spare steps in)
> - the data disk is brought back online
> - the hot spare is faulted
> - the hot spare is brought back online and detached from the pool (to
>   stop it from acting as a spare for the data disc that faulted) - the
>   original data disc is faulted again
> 
> When the above takes place, the spare ends up replacing the data disc
> completely in the pool but it still shows up as a spare. This occurs
> with mirror, raidz1 and raidz2 volumes.

Yes, this sounds like a variation of a known bug that's on my queue to
look at.  Basically, the way we determine if something is a spare or not
is rather broken, and you can confuse ZFS to the point of doing the
wrong thing.  I'll take a specific look at this case and see if it's the
same underlying root cause.

> On another note, when a disk is faulted the console output says
> "AUTO-RESPONSE: No automated response will occur." - shouldn't this
> mention that a hot spare action will happen?

Yep.  I'll take care of this when I do the next phase of ZFS/FMA
integration.

- Eri

--
Eric Schrock, Solaris Kernel Development       http://blogs.sun.com/eschrock
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to