*Platform:*

    * OpenSolaris snv79 on an older beige-box Intel x86
    * Apple XRaid disk box, with 7 JBOD disks
    * LSI FC controller -
      
http://www.lsi.com/storage_home/products_home/host_bus_adapters/fibre_channel_hbas/lsi7404eplc/index.html?remote=1&locale=EN
      
<http://www.lsi.com/storage_home/products_home/host_bus_adapters/fibre_channel_hbas/lsi7404eplc/index.html?remote=1&locale=EN>

*Description:*
When a drive is yanked, this happy pool:

            datapool                   ONLINE       0     0     0
              raidz1                   ONLINE       0     0     0
                c9t60003930000214EEd0  ONLINE       0     0     0
                c9t60003930000214EEd1  ONLINE       0     0     0
                c9t60003930000214EEd2  ONLINE       0     0     0
                c9t60003930000214EEd3  ONLINE       0     0     0
                c9t60003930000214EEd4  ONLINE       0     0     0
                c9t60003930000214EEd5  ONLINE       0     0     0
                c9t60003930000214EEd6  ONLINE       0     0     0
      


Turns into this unhappy pool that cannot reflect reality:

            datapool                   DEGRADED     0     0     0
              raidz1                   DEGRADED     0     0     0
                c9t60003930000214EEd0  ONLINE       0     0     0
                c9t60003930000214EEd1  ONLINE       0     0     0
                c9t60003930000214EEd2  ONLINE       0     0     0
                c9t60003930000214EEd3  ONLINE       0     0     0
                c9t60003930000214EEd4  ONLINE       0     0     0
                c9t60003930000214EEd6  FAULTED      0     0     0  corrupted 
data
                c9t60003930000214EEd6  ONLINE       0     0     0
      

Note that c9t60003930000214EEd6, impossibly, appears in _*TWICE*_ in the 
list!

After replacing the disk with a mostly-blank disk (with some leftover 
zfs headers on it from another experiment), I'm unable to offline of 
repleace c9t60003930000214EEd5, or generally do anything that would 
bring the array out of the degraded state.

If I export/import the pool, it looks like this:

            NAME                       STATE     READ WRITE CKSUM
            datapool                   DEGRADED     0     0     0
              raidz1                   DEGRADED     0     0     0
                c9t60003930000214EEd0  ONLINE       0     0     0
                c9t60003930000214EEd1  ONLINE       0     0     0
                c9t60003930000214EEd2  ONLINE       0     0     0
                c9t60003930000214EEd3  ONLINE       0     0     0
                c9t60003930000214EEd4  ONLINE       0     0     0
                6898074116173351320    FAULTED      0     0     0  was 
/dev/dsk/c9t60003930000214EEd6s0
                c9t60003930000214EEd6  ONLINE       0     0     0

    errors: No known data errors
      


*Some thoughts:*

    * Has anyone else seen this?
    * Having a device in the raidz list twice is clearly a problem!
    * Being able to change the device list by exporting/importing
      (without plugging/unplugging any hardware) is clearly a problem, too!
    * Might the LSI driver or the XRaid re-order the d[0-9] devices when
      one of them goes away?
    * We're thinking of various other ways to expose at this problem: a
      newer version of OpenSolaris (b85, probably), and blanking
      drives-used-in-other-experiments more aggressively.


Thanks,
-Luke

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to