Re: [zfs-discuss] ZFS recovery from a disk losing power

grant beattie Tue, 16 May 2006 10:22:37 -0700

On Tue, May 16, 2006 at 10:13:46AM -0700, Eric Schrock wrote:

> What has happened is that your device has started reporting errors, but
> is still available on the system.  i.e. ZFS is still able to ldi_open()
> the underlying device.  This seems like a strange failure mode for the
> device (you may want to investigate how that's possible), but ZFS is
> functioning as designed.  You can verify this by doing 'dtrace -n
> vdev_reopen:entry', which should show ZFS attempting to reopen the
> device once a minute or so.  We currently only detect device failure
> when the device "goes away".


hi Eric,

you're right, the aac card appears to offline the disk but the LUN is
still available (though its an empty device). I'll capture some more info
when I try this again tomorrow.

what I find interesting is that the SCSI errors were continuous for 10
minutes before I detached it, ZFS wasn't backing off at all. it was
flooding the VGA console quicker than the console could print it all
:) from what you said above, once per minute would have been more
desirable.

I wonder why, given that ZFS knew there was a problem with this disk,
that it wasn't marked FAULTED and the pool DEGRADED?

I don't know enough about the internals to know why SVM happily
offlined the device after a short burst of errors - that's certainly
more friendly and expected. is there any way I can get the same
failure mode with ZFS?

> A future enhancement is to do predictive analysis based on error rates.
> This will leverage the full power of FMA diagnosis, allowing us to
> perform SERD analysis and incorporate past history as a mechanism for
> predicting future failure.  This will also incoporate the SMART
> predictive failure bit when available.  We haven't started work on this
> yet, but we have a plan for doing so.

that would be cool, too :)

grant.

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS recovery from a disk losing power

Reply via email to