What has happened is that your device has started reporting errors, but
is still available on the system.  i.e. ZFS is still able to ldi_open()
the underlying device.  This seems like a strange failure mode for the
device (you may want to investigate how that's possible), but ZFS is
functioning as designed.  You can verify this by doing 'dtrace -n
vdev_reopen:entry', which should show ZFS attempting to reopen the
device once a minute or so.  We currently only detect device failure
when the device "goes away".

A future enhancement is to do predictive analysis based on error rates.
This will leverage the full power of FMA diagnosis, allowing us to
perform SERD analysis and incorporate past history as a mechanism for
predicting future failure.  This will also incoporate the SMART
predictive failure bit when available.  We haven't started work on this
yet, but we have a plan for doing so.

- Eric

On Tue, May 16, 2006 at 07:02:37PM +1000, grant beattie wrote:
> running b37 on amd64. after removing power from a disk configured as
> a mirror, 10 minutes has passed and ZFS has still not offlined it.
> 
> # zpool status tank
>   pool: tank
>  state: ONLINE
> status: One or more devices has experienced an unrecoverable error.  An
>         attempt was made to correct the error.  Applications are unaffected.
> action: Determine if the device needs to be replaced, and clear the errors
>         using 'zpool clear' or replace the device with 'zpool replace'.
>    see: http://www.sun.com/msg/ZFS-8000-9P
>  scrub: none requested
> config:
> 
>         NAME        STATE     READ WRITE CKSUM
>         tank        ONLINE       0     0     0
>           mirror    ONLINE       0     0     0
>             c4t0d0  ONLINE      14 6.05K     0
>             c4t1d0  ONLINE       0     0     0
> 
> errors: No known data errors
> 
> # grep 'Hardware_Error' /var/adm/messages | wc -l
>     7632
> 
> only after I manually ran "zfs detach tank c4t0d0" did the SCSI errors
> stop. I would have expected it to be offlined automatically, which is
> exactly what happened when I did this same test with an SVM mirror.
> 
> is this a bug?
> 
> grant.
> 
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

--
Eric Schrock, Solaris Kernel Development       http://blogs.sun.com/eschrock
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to