Hi Richard, Thanks, I'll give that a try. I think I just had a kernel dump while trying to boot this system back up though, I don't think it likes it if the iscsi targets aren't available during boot. Again, that rings a bell, so I'll go see if that's another known bug.
Changing that setting on the fly didn't seem to help, if anything things are worse this time around. I changed the timeout to 15 seconds, but didn't restart any services: # echo iscsi_rx_max_window/D | mdb -k iscsi_rx_max_window: iscsi_rx_max_window: 180 # echo iscsi_rx_max_window/W0t15 | mdb -kw iscsi_rx_max_window: 0xb4 = 0xf # echo iscsi_rx_max_window/D | mdb -k iscsi_rx_max_window: iscsi_rx_max_window: 15 After making those changes, and repeating the test, offlining an iscsi volume hung all the commands running on the pool. I had three ssh sessions open, running the following: # zpool iostats -v iscsipool 10 100 # format < /dev/null # time zpool status They hung for what felt a minute or so. After that, the CIFS copy timed out. After the CIFS copy timed out, I tried immediately restarting it. It took a few more seconds, but restarted no problem. Within a few seconds of that restarting, iostat recovered, and format returned it's result too. Around 30 seconds later, zpool status reported two drives, paused again, then showed the status of the third: # time zpool status pool: iscsipool state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: resilver completed after 0h0m with 0 errors on Tue Dec 2 16:39:21 2008 config: NAME STATE READ WRITE CKSUM iscsipool ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c2t600144F04933FF6C00005056967AC800d0 ONLINE 0 0 0 15K resilvered c2t600144F04934FAB300005056964D9500d0 ONLINE 0 0 0 15K resilvered c2t600144F04934119E000050569675FF00d0 ONLINE 0 200 0 24K resilvered errors: No known data errors real 3m51.774s user 0m0.015s sys 0m0.100s Repeating that a few seconds later gives: # time zpool status pool: iscsipool state: DEGRADED status: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Attach the missing device and online it using 'zpool online'. see: http://www.sun.com/msg/ZFS-8000-2Q scrub: resilver completed after 0h0m with 0 errors on Tue Dec 2 16:39:21 2008 config: NAME STATE READ WRITE CKSUM iscsipool DEGRADED 0 0 0 raidz1 DEGRADED 0 0 0 c2t600144F04933FF6C00005056967AC800d0 ONLINE 0 0 0 15K resilvered c2t600144F04934FAB300005056964D9500d0 ONLINE 0 0 0 15K resilvered c2t600144F04934119E000050569675FF00d0 UNAVAIL 3 5.80K 0 cannot open errors: No known data errors real 0m0.272s user 0m0.029s sys 0m0.169s On Tue, Dec 2, 2008 at 3:58 PM, Richard Elling <[EMAIL PROTECTED]> wrote: ...... > iSCSI timeout is set to 180 seconds in the client code. The only way > to change is to recompile it, or use mdb. Since you have this test rig > setup, and I don't, do you want to experiment with this timeout? > The variable is actually called "iscsi_rx_max_window" so if you do > echo iscsi_rx_max_window/D | mdb -k > you should see "180" > Change it using something like: > echo iscsi_rx_max_window/W0t30 | mdb -kw > to set it to 30 seconds. > -- richard _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss