Re: [zfs-discuss] ZFS recovery from a disk losing power

Eric Schrock Tue, 16 May 2006 10:32:44 -0700

On Wed, May 17, 2006 at 03:22:34AM +1000, grant beattie wrote:
> 
> what I find interesting is that the SCSI errors were continuous for 10
> minutes before I detached it, ZFS wasn't backing off at all. it was
> flooding the VGA console quicker than the console could print it all
> :) from what you said above, once per minute would have been more
> desirable.


The "once per minute" is related to the frequency at which ZFS tries to
reopen the device.  Regardless, ZFS will try to issue I/O to the device
whenever asked.  If you believe the device is completely broken, the
correct procedure (as documented in the ZFS Administration Guide), is to
'zpool offline' the device until you are able to repair it.

> I wonder why, given that ZFS knew there was a problem with this disk,
> that it wasn't marked FAULTED and the pool DEGRADED?

This is the future enhancement that I described below.  We need more
sophisticated analysis than simply 'N errors = FAULTED', and that's what
FMA provides.  It will allow us to interact with larger fault management
(such as correlating SCSI errors, identifying controller failure, and
more).  ZFS is a intentionally dumb.  Each subsystem is responsible for
reporting errors, but coordinated fault diagnosis has to happen at a
higher level.

> I don't know enough about the internals to know why SVM happily
> offlined the device after a short burst of errors - that's certainly
> more friendly and expected. is there any way I can get the same
> failure mode with ZFS?

Not currently.

- Eric

--
Eric Schrock, Solaris Kernel Development       http://blogs.sun.com/eschrock
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS recovery from a disk losing power

Reply via email to