Just to report back to the list... Sorry for the lengthy post So I've tested the iSCSI based zfs mirror on Sol 10u4, and it does more or less work as expected. If I unplug one side of the mirror - unplug or power down one of the iSCSI targets - I/O to the zpool stops for a while, perhaps a minute, and then things free up again. zpool commands seem to get unworkably slow, and error messages fly by on the console like fire ants running from a flood. Worst of all, plugging the faulted mirror back in (before removing the mirror from the pool) it's very hard to bring the faulted device back online:
prudhoe # zpool status pool: test state: DEGRADED status: One or more devices could not be used because the label is missing or invalid. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the device using 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-4J scrub: resilver completed with 0 errors on Tue Apr 8 16:34:08 2008 config: NAME STATE READ WRITE CKSUM test DEGRADED 0 0 0 mirror DEGRADED 0 0 0 c2t1d0 FAULTED 0 2.88K 0 corrupted data c2t1d0 ONLINE 0 0 0 errors: No known data errors >>>>>>>>> Comment: why are there now two instances of c2t1d0?? <<<<<<<<<< prudhoe # zpool replace test c2t2d0 invalid vdev specification use '-f' to override the following errors: /dev/dsk/c2t1d0s0 is part of active ZFS pool test. Please see zpool(1M). prudhoe # zpool replace -f test c2t2d0 invalid vdev specification the following errors must be manually repaired: /dev/dsk/c2t1d0s0 is part of active ZFS pool test. Please see zpool(1M). prudhoe # zpool remove test c2t2d0 cannot remove c2t2d0: no such device in pool prudhoe # zpool offline test c2t2d0 cannot offline c2t2d0: no such device in pool prudhoe # zpool online test c2t2d0 cannot online c2t2d0: no such device in pool >>>>>>>>>> OK, get more drastic <<<<<<<<<<<<<< prudhoe # zpool clear test prudhoe # zpool status pool: test state: DEGRADED status: One or more devices could not be used because the label is missing or invalid. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the device using 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-4J scrub: resilver completed with 0 errors on Tue Apr 8 16:34:08 2008 config: NAME STATE READ WRITE CKSUM test DEGRADED 0 0 0 mirror DEGRADED 0 0 0 c2t1d0 FAULTED 0 0 0 corrupted data c2t1d0 ONLINE 0 0 0 errors: No known data errors >>>>>>>>>>>>>>>>>>>>> Frustration setting in. The error counts are zero, but >>>>>>>>>>>>>>>>>>>>> still two instances of c2t1d0 listed... <<<<<<<<<<<<<<<< prudhoe # zpool export test prudhoe # zpool import test prudhoe # zpool list NAME SIZE USED AVAIL CAP HEALTH ALTROOT test 12.9G 9.54G 3.34G 74% ONLINE - prudhoe # zpool status pool: test state: ONLINE status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress, 1.11% done, 0h20m to go config: NAME STATE READ WRITE CKSUM test ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t2d0 ONLINE 0 0 0 c2t1d0 ONLINE 0 0 0 errors: No known data errors >>>>> Finally resilvering with the right devices. The thing I really don't >>>>> like here is the pool had to be exported and then imported to make this >>>>> work. For an NFS server, this is not really acceptable. Now I know this >>>>> is ol' Solaris 10u4, but still, I'm surprised I needed to export/import >>>>> the pool to get it working correctly again. Anyone know what I did >>>>> wrong? Is there a canonical way to online the previously faulted device? Anyway, It looks like for now, I can get some sort of HA our of this iSCSI mirror. The other pluses is the pool can self heal, and reads will be spread across both units. Cheers, Jon --- P.S. Playing with this more before sending this message, if you can detach the faulted mirror before putting it back online, it all works well. Hope that nothing bounces on your network when you have a failure: ---->>>> unplug one iscsi mirror, then: prudhoe # zpool status -v pool: test state: DEGRADED status: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Attach the missing device and online it using 'zpool online'. see: http://www.sun.com/msg/ZFS-8000-D3 scrub: scrub completed with 0 errors on Wed Apr 9 14:18:45 2008 config: NAME STATE READ WRITE CKSUM test DEGRADED 0 0 0 mirror DEGRADED 0 0 0 c2t2d0 UNAVAIL 4 91 0 cannot open c2t1d0 ONLINE 0 0 0 errors: No known data errors prudhoe # zpool detach test c2t2d0 prudhoe # zpool status -v pool: test state: ONLINE scrub: scrub completed with 0 errors on Wed Apr 9 14:18:45 2008 config: NAME STATE READ WRITE CKSUM test ONLINE 0 0 0 c2t1d0 ONLINE 0 0 0 errors: No known data errors ----->>>> replug the downed mirror, and: prudhoe # zpool attach test c2t1d0 c2t2d0 prudhoe # zpool status -v pool: test state: ONLINE status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress, 0.04% done, 2h17m to go config: NAME STATE READ WRITE CKSUM test ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t1d0 ONLINE 0 0 0 c2t2d0 ONLINE 0 0 0 errors: No known data errors Viola! Jon _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss