sorry... I'm confusing the issue... the messages that I showed below were from a iSCSI Disconnect test that we had done while connecting to the EqualLogic. I will re-setup my test tot he tgtd to see what it and the initiator show. I will take you suggestion of writting to dev/null.
On Apr 14, 2011, at 7:39 AM, Joe Hoot wrote: > Here's what I am seeing (roughly... with lines scraped in between.. and I'm > only grepping for connect7/session7): > > Apr 11 13:22:24 oim61024009 kernel: connection7:0: iscsi_tcp_segment_done > copied 2592 0 size 4096 xmit > ...lots of the above types of messages... > Apr 11 13:22:24 oim61024009 kernel: connection7:0: iscsi_tcp_segment_done > copied 2592 0 size 4096 xmit > Apr 11 13:22:24 oim61024009 kernel: connection7:0: > iscsi_check_transport_timeouts Setting next tmo 2279563 > Apr 11 13:22:24 oim61024009 kernel: connection7:0: iscsi_tcp_segment_done > copied 2592 0 size 4096 xmit > ...lots of the above types of messages... > Apr 11 13:22:24 oim61024009 kernel: connection7:0: iscsi_tcp_segment_done > copied 2592 0 size 4096 xmit > Apr 11 13:22:24 oim61024009 kernel: connection7:0: > iscsi_check_transport_timeouts Setting next tmo 2279563 > Apr 11 13:22:24 oim61024009 kernel: connection7:0: iscsi_tcp_segment_done > copied 2592 0 size 4096 xmit > ...lots of the above types of messages... > Apr 11 13:22:25 oim61024009 kernel: connection7:0: iscsi_tcp_segment_done > copied 2592 0 size 4096 xmit > Apr 11 13:22:25 oim61024009 kernel: connection7:0: > iscsi_check_transport_timeouts Sending nopout as ping > Apr 11 13:22:25 oim61024009 kernel: connection7:0: > iscsi_check_transport_timeouts Setting next tmo 2280813 > Apr 11 13:22:25 oim61024009 kernel: connection7:0: iscsi_tcp_segment_done > copied 2592 0 size 4096 xmit > ...lots of the above types of messages... > Apr 11 13:22:30 oim61024009 kernel: connection7:0: iscsi_tcp_segment_done > copied 2592 0 size 4096 xmit > Apr 11 13:22:30 oim61024009 kernel: connection7:0: ping timeout of 5 secs > expired, recv timeout 5, last rx 2278313, last ping 2279563, now 2280813 > Apr 11 13:22:30 oim61024009 kernel: connection7:0: iscsi_tcp_segment_done > copied 2592 0 size 4096 xmit > Apr 11 13:22:30 oim61024009 kernel: connection7:0: detected conn error (1011) > Apr 11 13:22:30 oim61024009 kernel: session7: iscsi_data_xmit Tx suspended! > Apr 11 13:22:30 oim61024009 kernel: session7: iscsi_start_session_recovery > blocking session > Apr 11 13:22:30 oim61024009 kernel: session7: fail_all_commands failing > pending sc df493a80 itt 0x108 > ...etc..etc.. > Apr 11 13:22:30 oim61024009 kernel: session7: fail_all_commands failing > pending sc d81de300 itt 0x159 > Apr 11 13:22:30 oim61024009 kernel: session7: fail_all_commands failing > requeued sc d89b6800 itt 0xd8 > Apr 11 13:22:30 oim61024009 kernel: connection7:0: iscsi_tcp_cleanup_task > pending r2t dropped > Apr 11 13:22:30 oim61024009 kernel: session7: fail_all_commands failing > requeued sc ca305ac0 itt 0xd9 > Apr 11 13:22:30 oim61024009 kernel: connection7:0: iscsi_tcp_cleanup_task > pending r2t dropped > Apr 11 13:22:30 oim61024009 kernel: session7: fail_all_commands failing > requeued sc d81de440 itt 0xdb > Apr 11 13:22:30 oim61024009 kernel: connection7:0: iscsi_tcp_cleanup_task > pending r2t dropped > Apr 11 13:22:30 oim61024009 kernel: session7: fail_all_commands failing > requeued sc ca305700 itt 0xd7 > ...pattern continues.. > Apr 11 13:22:30 oim61024009 kernel: session7: fail_all_commands failing > requeued sc d847ee40 itt 0xfa > Apr 11 13:22:30 oim61024009 kernel: connection7:0: iscsi_tcp_cleanup_task > pending r2t dropped > Apr 11 13:22:30 oim61024009 kernel: session7: fail_all_commands failing in > progress sc c9ac8840 itt 0xfb > Apr 11 13:22:30 oim61024009 kernel: session7: fail_all_commands failing in > progress sc d1e04800 itt 0xfc > Apr 11 13:22:30 oim61024009 kernel: session7: fail_all_commands failing in > progress sc df0cdc00 itt 0xfd > ...pattern continues... > Apr 11 13:22:30 oim61024009 kernel: session7: fail_all_commands failing in > progress sc d8811700 itt 0x107 > Apr 11 13:22:30 oim61024009 kernel: session7: flush_control_queues flushing > pending mgmt task itt 0x15a > Apr 11 13:22:30 oim61024009 kernel: session7: iscsi2_queuecommand cmd 0x2a > rejected (7) > Apr 11 13:22:32 oim61024009 kernel: connection7:0: iscsi_tcp_hdr_recv_prep > (digest disabled) > Apr 11 13:22:32 oim61024009 kernel: connection7:0: iscsi_tcp_task_init mtask > deq [itt 0x0] > Apr 11 13:22:32 oim61024009 kernel: connection7:0: > iscsi_sw_tcp_send_hdr_prep digest disabled > Apr 11 13:22:32 oim61024009 kernel: connection7:0: > iscsi_sw_tcp_send_linear_data_prep datalen=208 digest disabled > Apr 11 13:22:32 oim61024009 kernel: session7: iscsi_prep_mgmt_task mgmtpdu > [op 0x3 hdr->itt 0x10000000 datalen 208] > Apr 11 13:22:32 oim61024009 kernel: connection7:0: iscsi_tcp_segment_done > copied 0 0 size 48 xmit > > > > > > > On Apr 14, 2011, at 6:45 AM, Mike Christie wrote: > >> On 04/14/2011 05:12 AM, Joe Hoot wrote: >>> Mike, >>> >>> I'm still digging into this "iSCSI Disconnect" issue that we've been >>> dealing with for about two years here in SUNY. The iscsi-initiator that >>> I'm using is this: >>> >>> iscsi-initiator-utils-6.2.0.872-6.0.2.el5 >>> >>> I've been running dt tests to the equallogic in our Oracle VM environment >>> with the following set: >>> >>> echo 1> /sys/module/libiscsi2/parameters/debug_libiscsi >>> echo 1> /sys/module/libiscsi_tcp/parameters/debug_libiscsi_tcp >>> echo 1> /sys/module/iscsi_tcp/parameters/debug_iscsi_tcp >>> >>> When I try to look for "echo 1> >>> /sys/module/libiscsi/parameters/debug_libiscsi_eh" I don't see that as an >>> available option. Are the debug lines above correct? Are there other >>> debug messages that I should be gathering? >>> >> >> It is only in newer kernels. debug_libiscsi will print out what >> debug_libiscsi_eh does plus more. >> >> >>> [root@oim61024001 src]# ls -l /sys/module/libiscsi >>> libiscsi2/ libiscsi_tcp/ >>> >>> >>> We have done numerous tests and different hardware. I have eliminated OVM >>> and have been testing this now on OEL5.6. >>> >>> The scenario that seems to break connections quickly is the following: >>> >>> - cluster of 5 nodes using OCFS2 (I will be trying to rule OCFS2 and >>> dm-multipath out shortly) >>> - 5-7 iSCSI volumes connected to each of those 5 nodes >>> - 4 threads of dt running against each of 5-7 volumes from each host = 28 >>> threads of dt slamming the volumes per host = 140 threads of dt per cluster. >>> - test only lasts about 2-5 minutes before I start seeing ping timeouts and >>> disconnects. >>> >>> The issue that we've seen thus far is mainly with EqualLogic and >>> open-iscsi. From what EQL is telling us, the initiator is "aborting" the >>> connection. But from the initiator-side, we just see "ping timeout" >>> messages (and then the connection eventually goes away). >>> >> >> When the ping times out, the initiator will abort/drop the connection. >> It will then try to relogin. So you should probably be seeing the >> ping/nop timeout message, then a conn error 1011, then a message about >> reconnected in X retries, right? >> >> >> The thing is that below with tgtd we seem to be seeing scsi commands >> timing out. On the initiator logs did you see ping/nop timeout messages >> when you hit the problem below with tgtd? >> >> With tgtd let's take take the disks out of it and just test iscsi. >> >> So do something like this: >> >> tgtadm --op new --mode target --tid 1 -T iqn.2001-04.com.pertest >> tgtadm --op new --mode logicalunit --tid 1 --lun 1 --bstype null -b >> /dev/null >> tgtadm --op bind --mode target --tid 1 -I ALL >> >> This of course will not do any real IO so you cannot do tests that >> verify day. Just do really harsh heavy read/write tests. >> >> >> When you do IO to fake disks do you see any ping/nop or scsi timeout errors? >> >> >>> We recently saw a thread (Apr 4) regarding cfq scheduler. So we quickly >>> tested noop and deadline, just to see if that would change anything-- it >>> didn't. >>> >>> So my most recent test was to try out a different target, just to see if we >>> could rule out the EqualLogic. Each time I changed from EQLX to the tgtd, >>> I would reset (and rescan in my volumes) the iscsid.conf's "FastAbort = >>> No", or yes (if I was testing tgtd), to conform with EQLX's best practices. >>> >> >> The abort setting is not going to help at all for any target. That >> basically kicks in after you have already hit the problem you are >> hitting. It just speeds up the handling of the problem. >> >> >>> So at this point, after I get dm-multipath and OCFS2 out of the equation, >>> it will be down to a tartget + kernel/initiator + I/O scheduler and I want >>> to make sure that I'm getting all the debug information that I might need >>> to analyze what is going on. >>> >>> Are there any other debug tunables that you might recommend adding to my >>> script? >> >> No, the ones you have above are all there is. Maybe tcpdump -w iscsi.out >> -i ethXYZ >> >> Did you have the initiator /var/log/messages with debugging on when you >> hit the problem below? If so send them. >> >>> >>> On Apr 14, 2011, at 12:02 AM, Mike Christie wrote: >>> >>>> On 04/12/2011 12:43 PM, Joe Hoot wrote: >>>>> I'm trying to understand the following messages: >>>>> >>>>> Apr 12 13:23:52 oim60025001 tgtd: conn_close(88) connection closed >>>>> 0x94d80c4 1 >>>>> .... lots of the above messages... >>>>> Apr 12 13:37:19 oim60025001 tgtd: abort_task_set(979) found 271 0 >>>>> .... lots of the above messages... >>>>> Apr 12 13:37:27 oim60025001 tgtd: abort_cmd(955) found e9 e >>>>> .... lots of the above messages... >>>>> Apr 12 13:39:08 oim60025001 tgtd: conn_close(88) connection closed >>>>> 0xa9ab8ec 1 >>>>> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "open-iscsi" group. >> To post to this group, send email to [email protected]. >> To unsubscribe from this group, send email to >> [email protected]. >> For more options, visit this group at >> http://groups.google.com/group/open-iscsi?hl=en. >> > > -- > You received this message because you are subscribed to the Google Groups > "open-iscsi" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/open-iscsi?hl=en. > -- You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
