While testing a zpool with a different storage adapter using my "blkdev"
device, I did a test which made a disk unavailable -- all attempts to
read from it report EIO.
I expected my configuration (which is a 3 disk test, with 2 disks in a
RAIDZ and a hot spare) to work where the hot spare would automatically
be activated. But I'm finding that ZFS does not behave this way -- if
only some I/Os are failed, then the hot spare is failed, but if ZFS
decides that the label is gone, it takes no attempt to recruit a hot spare.
I had added FMA notification to my blkdev driver - it will post
device.no_response or device.invalid_state ereports (per the
ddi_fm_ereport_post() man page) in certain failure scenarios.
I *suspect* the problem is in the FMA notification for zfs-retire, where
the event is not being interpreted in a way that ZFS retire can figure
out that the drive is toasted.
Of course, this is just an educated guess on my part. I'm no ZFS nor
FMA expert here.
Am I missing something here? Under what conditions can I expect hot
spares to be recruited?
My zpool status showing the results is below.
- Garrett
> pfexec zpool status
pool: rpool
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
c1t0d0s0 ONLINE 0 0 0
errors: No known data errors
pool: testpool
state: DEGRADED
status: One or more devices could not be used because the label is
missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
see: http://www.sun.com/msg/ZFS-8000-4J
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
testpool DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
c2t3d0 ONLINE 0 0 0
c2t3d1 UNAVAIL 9 132 0 experienced I/O failures
spares
c2t3d2 AVAIL
errors: No known data errors
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss