On Tue, Dec 12, 2006 at 02:38:22PM -0500, James F. Hranicky wrote:
> 
> Dec 11 14:42:32.1271 1319464e-7a8c-e65b-962e-db386e90f7f2 ZFS-8000-D3
>   100%  fault.fs.zfs.device
> 
>         Problem in: zfs://pool=2646e20c1cb0a9d0/vdev=724c128cdbc17745
>            Affects: zfs://pool=2646e20c1cb0a9d0/vdev=724c128cdbc17745
>                FRU: -
> 
> I'm not really sure what it means.

Hmmm, it means that we correctly noticed that the device had failed, but
for whatever reason the ZFS FMA agent didn't correctly replace the
drive.  I am cleaning up the hot spare behavior as we speak so I will
try to reproduce this.

> Well, as long as I know which device is affected :-> If "zpool status"
> doesn't return it may be difficult to figure out.
> 
> Do you know if the SATA controllers in a Thumper can better handle this
> problem?

I will be starting a variety of experiments in this vein in the near
future.  Others may be able to describe their experiences so far.  How
exactly did you 'spin down' the drives in question?  Is there a
particular failure mode you're interested in?

> Do you have an idea as to when this might be available?

It will be a while before the complete functionality is finished.  I
have begun the work, but there are several distinct phases.  First, I
am cleaning up the existing hot spare behavior.  Second, I'm adding
proper hotplug support to ZFS so that it detects device removal without
freaking out and correctly resilvers/replaces drives when they are
plugged back in.  Finally, I'll be adding a ZFS diagnosis engine to both
analyze ZFS faults as well as consume SMART data to predict disk failure
and proactively offline devices.  I would estimate that it will be a few
months before I get all of this into Nevada.

- Eric

--
Eric Schrock, Solaris Kernel Development       http://blogs.sun.com/eschrock
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to