On 04/ 5/10 05:28 AM, Eric Schrock wrote:
On Apr 5, 2010, at 3:38 AM, Garrett D'Amore wrote:
Am I missing something here?  Under what conditions can I expect hot spares to 
be recruited?
Hot spares are activated by the zfs-retire agent in response to a list.suspect 
event containing one of the following faults:

        fault.fs.zfs.vdev.io
        fault.fs.zfs.vdev.checksum
        fault.fs.zfs.device

The last of these (fault.fs.zfs.device) is what is diagnosed when a label is 
corrupted.  What software are you runnig?  Have you confirmed that you are 
getting one of these faults?  What does 'fmdump -V' show?  Does doing a 'zpool 
replace c2t3d1 c2t3d2' by hand succeed?

I see ereport.fs.zfs.io_failure, and ereport.fs.zfs.probe_failure. Also, ereport.io.service.lost and ereport.io.device.inval_state. There is indeed a fault.fs.zfs.device in the list as well.

Clearly ZFS thinks the device is unavailable (which is accurate).

And "pfexec zpool replace testpool c2t3d1 c2t3d2" works fine, as shown here:

gdam...@tabasco{33}> pfexec zpool status
  pool: rpool
 state: ONLINE
 scrub: none requested
config:

    NAME        STATE     READ WRITE CKSUM
    rpool       ONLINE       0     0     0
      c1t0d0s0  ONLINE       0     0     0

errors: No known data errors

  pool: testpool
 state: DEGRADED
status: One or more devices could not be opened. Sufficient replicas exist for
    the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-2Q
scrub: resilver completed after 0h0m with 0 errors on Mon Apr 5 08:39:57 2010
config:

    NAME          STATE     READ WRITE CKSUM
    testpool      DEGRADED     0     0     0
      raidz1-0    DEGRADED     0     0     0
        c2t3d0    ONLINE       0     0     0
        spare-1   DEGRADED     0     0     0
          c2t3d1  UNAVAIL      9   132     0  cannot open
          c2t3d2  ONLINE       0     0     0  20.8M resilvered
    spares
      c2t3d2      INUSE     currently in use

errors: No known data errors
gdam...@tabasco{34}>


Everything seems to be correct *except* that ZFS isn't automatically doing the replace operation with the hot spare.

It feels to me like this is possibly a ZFS bug --- perhaps ZFS is expecting a specific set of FMA faults that only sd delivers? (Recall this is with a different target device.)

    - Garrett

- Eric

--
Eric Schrock, Fishworks                        http://blogs.sun.com/eschrock


_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to