On Tue, May 16, 2006 at 10:13:46AM -0700, Eric Schrock wrote: > What has happened is that your device has started reporting errors, but > is still available on the system. i.e. ZFS is still able to ldi_open() > the underlying device. This seems like a strange failure mode for the > device (you may want to investigate how that's possible), but ZFS is > functioning as designed. You can verify this by doing 'dtrace -n > vdev_reopen:entry', which should show ZFS attempting to reopen the > device once a minute or so. We currently only detect device failure > when the device "goes away".
hi Eric, you're right, the aac card appears to offline the disk but the LUN is still available (though its an empty device). I'll capture some more info when I try this again tomorrow. what I find interesting is that the SCSI errors were continuous for 10 minutes before I detached it, ZFS wasn't backing off at all. it was flooding the VGA console quicker than the console could print it all :) from what you said above, once per minute would have been more desirable. I wonder why, given that ZFS knew there was a problem with this disk, that it wasn't marked FAULTED and the pool DEGRADED? I don't know enough about the internals to know why SVM happily offlined the device after a short burst of errors - that's certainly more friendly and expected. is there any way I can get the same failure mode with ZFS? > A future enhancement is to do predictive analysis based on error rates. > This will leverage the full power of FMA diagnosis, allowing us to > perform SERD analysis and incorporate past history as a mechanism for > predicting future failure. This will also incoporate the SMART > predictive failure bit when available. We haven't started work on this > yet, but we have a plan for doing so. that would be cool, too :) grant. _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss