Re: Target or Initiator closing connection?

Joe Hoot Thu, 14 Apr 2011 04:40:09 -0700

Here's what I am seeing (roughly... with lines scraped in between.. and I'm 
only grepping for connect7/session7):


Apr 11 13:22:24 oim61024009 kernel:  connection7:0: iscsi_tcp_segment_done 
copied 2592 0 size 4096 xmit
...lots of the above types of messages...
Apr 11 13:22:24 oim61024009 kernel:  connection7:0: iscsi_tcp_segment_done 
copied 2592 0 size 4096 xmit
Apr 11 13:22:24 oim61024009 kernel:  connection7:0: 
iscsi_check_transport_timeouts Setting next tmo 2279563
Apr 11 13:22:24 oim61024009 kernel:  connection7:0: iscsi_tcp_segment_done 
copied 2592 0 size 4096 xmit
...lots of the above types of messages...
Apr 11 13:22:24 oim61024009 kernel:  connection7:0: iscsi_tcp_segment_done 
copied 2592 0 size 4096 xmit
Apr 11 13:22:24 oim61024009 kernel:  connection7:0: 
iscsi_check_transport_timeouts Setting next tmo 2279563
Apr 11 13:22:24 oim61024009 kernel:  connection7:0: iscsi_tcp_segment_done 
copied 2592 0 size 4096 xmit
...lots of the above types of messages...
Apr 11 13:22:25 oim61024009 kernel:  connection7:0: iscsi_tcp_segment_done 
copied 2592 0 size 4096 xmit
Apr 11 13:22:25 oim61024009 kernel:  connection7:0: 
iscsi_check_transport_timeouts Sending nopout as ping
Apr 11 13:22:25 oim61024009 kernel:  connection7:0: 
iscsi_check_transport_timeouts Setting next tmo 2280813
Apr 11 13:22:25 oim61024009 kernel:  connection7:0: iscsi_tcp_segment_done 
copied 2592 0 size 4096 xmit
...lots of the above types of messages...
Apr 11 13:22:30 oim61024009 kernel:  connection7:0: iscsi_tcp_segment_done 
copied 2592 0 size 4096 xmit
Apr 11 13:22:30 oim61024009 kernel:  connection7:0: ping timeout of 5 secs 
expired, recv timeout 5, last rx 2278313, last ping 2279563, now 2280813
Apr 11 13:22:30 oim61024009 kernel:  connection7:0: iscsi_tcp_segment_done 
copied 2592 0 size 4096 xmit
Apr 11 13:22:30 oim61024009 kernel:  connection7:0: detected conn error (1011)
Apr 11 13:22:30 oim61024009 kernel:  session7: iscsi_data_xmit Tx suspended!
Apr 11 13:22:30 oim61024009 kernel:  session7: iscsi_start_session_recovery 
blocking session
Apr 11 13:22:30 oim61024009 kernel:  session7: fail_all_commands failing 
pending sc df493a80 itt 0x108
...etc..etc..
Apr 11 13:22:30 oim61024009 kernel:  session7: fail_all_commands failing 
pending sc d81de300 itt 0x159
Apr 11 13:22:30 oim61024009 kernel:  session7: fail_all_commands failing 
requeued sc d89b6800 itt 0xd8
Apr 11 13:22:30 oim61024009 kernel:  connection7:0: iscsi_tcp_cleanup_task 
pending r2t dropped
Apr 11 13:22:30 oim61024009 kernel:  session7: fail_all_commands failing 
requeued sc ca305ac0 itt 0xd9
Apr 11 13:22:30 oim61024009 kernel:  connection7:0: iscsi_tcp_cleanup_task 
pending r2t dropped
Apr 11 13:22:30 oim61024009 kernel:  session7: fail_all_commands failing 
requeued sc d81de440 itt 0xdb
Apr 11 13:22:30 oim61024009 kernel:  connection7:0: iscsi_tcp_cleanup_task 
pending r2t dropped
Apr 11 13:22:30 oim61024009 kernel:  session7: fail_all_commands failing 
requeued sc ca305700 itt 0xd7
...pattern continues..
Apr 11 13:22:30 oim61024009 kernel:  session7: fail_all_commands failing 
requeued sc d847ee40 itt 0xfa
Apr 11 13:22:30 oim61024009 kernel:  connection7:0: iscsi_tcp_cleanup_task 
pending r2t dropped
Apr 11 13:22:30 oim61024009 kernel:  session7: fail_all_commands failing in 
progress sc c9ac8840 itt 0xfb
Apr 11 13:22:30 oim61024009 kernel:  session7: fail_all_commands failing in 
progress sc d1e04800 itt 0xfc
Apr 11 13:22:30 oim61024009 kernel:  session7: fail_all_commands failing in 
progress sc df0cdc00 itt 0xfd
...pattern continues...
Apr 11 13:22:30 oim61024009 kernel:  session7: fail_all_commands failing in 
progress sc d8811700 itt 0x107
Apr 11 13:22:30 oim61024009 kernel:  session7: flush_control_queues flushing 
pending mgmt task itt 0x15a
Apr 11 13:22:30 oim61024009 kernel:  session7: iscsi2_queuecommand cmd 0x2a 
rejected (7)
Apr 11 13:22:32 oim61024009 kernel:  connection7:0: iscsi_tcp_hdr_recv_prep 
(digest disabled)
Apr 11 13:22:32 oim61024009 kernel:  connection7:0: iscsi_tcp_task_init mtask 
deq [itt 0x0]
Apr 11 13:22:32 oim61024009 kernel:  connection7:0: iscsi_sw_tcp_send_hdr_prep 
digest disabled
Apr 11 13:22:32 oim61024009 kernel:  connection7:0: 
iscsi_sw_tcp_send_linear_data_prep datalen=208 digest disabled
Apr 11 13:22:32 oim61024009 kernel:  session7: iscsi_prep_mgmt_task mgmtpdu [op 
0x3 hdr->itt 0x10000000 datalen 208]
Apr 11 13:22:32 oim61024009 kernel:  connection7:0: iscsi_tcp_segment_done 
copied 0 0 size 48 xmit






On Apr 14, 2011, at 6:45 AM, Mike Christie wrote:

> On 04/14/2011 05:12 AM, Joe Hoot wrote:
>> Mike,
>> 
>> I'm still digging into this "iSCSI Disconnect" issue that we've been dealing 
>> with for about two years here in SUNY.  The iscsi-initiator that I'm using 
>> is this:
>> 
>>   iscsi-initiator-utils-6.2.0.872-6.0.2.el5
>> 
>> I've been running dt tests to the equallogic in our Oracle VM environment 
>> with the following set:
>> 
>>   echo 1>  /sys/module/libiscsi2/parameters/debug_libiscsi
>>   echo 1>  /sys/module/libiscsi_tcp/parameters/debug_libiscsi_tcp
>>   echo 1>  /sys/module/iscsi_tcp/parameters/debug_iscsi_tcp
>> 
>> When I try to look for "echo 1>  
>> /sys/module/libiscsi/parameters/debug_libiscsi_eh" I don't see that as an 
>> available option.  Are the debug lines above correct?  Are there other debug 
>> messages that I should be gathering?
>> 
> 
> It is only in newer kernels. debug_libiscsi will print out what 
> debug_libiscsi_eh does plus more.
> 
> 
>> [root@oim61024001 src]# ls -l /sys/module/libiscsi
>> libiscsi2/    libiscsi_tcp/
>> 
>> 
>> We have done numerous tests and different hardware.  I have eliminated OVM 
>> and have been testing this now on OEL5.6.
>> 
>> The scenario that seems to break connections quickly is the following:
>> 
>> - cluster of 5 nodes using OCFS2 (I will be trying to rule OCFS2 and 
>> dm-multipath out shortly)
>> - 5-7 iSCSI volumes connected to each of those 5 nodes
>> - 4 threads of dt running against each of 5-7 volumes from each host = 28 
>> threads of dt slamming the volumes per host = 140 threads of dt per cluster.
>> - test only lasts about 2-5 minutes before I start seeing ping timeouts and 
>> disconnects.
>> 
>> The issue that we've seen thus far is mainly with EqualLogic and open-iscsi. 
>>   From what EQL is telling us, the initiator is "aborting" the connection.  
>> But from the initiator-side, we just see "ping timeout" messages (and then 
>> the connection eventually goes away).
>> 
> 
> When the ping times out, the initiator will abort/drop the connection. 
> It will then try to relogin. So you should probably be seeing the 
> ping/nop timeout message, then a conn error 1011, then a message about 
> reconnected in X retries, right?
> 
> 
> The thing is that below with tgtd we seem to be seeing scsi commands 
> timing out. On the initiator logs did you see ping/nop timeout messages 
> when you hit the problem below with tgtd?
> 
> With tgtd let's take take the disks out of it and just test iscsi.
> 
> So do something like this:
> 
> tgtadm --op new --mode target --tid 1 -T iqn.2001-04.com.pertest
> tgtadm --op new --mode logicalunit --tid 1 --lun 1 --bstype null -b 
> /dev/null
> tgtadm --op bind --mode target --tid 1 -I ALL
> 
> This of course will not do any real IO so you cannot do tests that 
> verify day. Just do really harsh heavy read/write tests.
> 
> 
> When you do IO to fake disks do you see any ping/nop or scsi timeout errors?
> 
> 
>> We recently saw a thread (Apr 4) regarding cfq scheduler.  So we quickly 
>> tested noop and deadline, just to see if that would change anything-- it 
>> didn't.
>> 
>> So my most recent test was to try out a different target, just to see if we 
>> could rule out the EqualLogic.  Each time I changed from EQLX to the tgtd, I 
>> would reset (and rescan in my volumes) the iscsid.conf's "FastAbort = No", 
>> or yes (if I was testing tgtd), to conform with EQLX's best practices.
>> 
> 
> The abort setting is not going to help at all for any target. That 
> basically kicks in after you have already hit the problem you are 
> hitting. It just speeds up the handling of the problem.
> 
> 
>> So at this point, after I get dm-multipath and OCFS2 out of the equation, it 
>> will be down to a tartget + kernel/initiator + I/O scheduler and I want to 
>> make sure that I'm getting all the debug information that I might need to 
>> analyze what is going on.
>> 
>> Are there any other debug tunables that you might recommend adding to my 
>> script?
> 
> No, the ones you have above are all there is. Maybe tcpdump -w iscsi.out 
> -i ethXYZ
> 
> Did you have the initiator /var/log/messages with debugging on when you 
> hit the problem below? If so send them.
> 
>> 
>> On Apr 14, 2011, at 12:02 AM, Mike Christie wrote:
>> 
>>> On 04/12/2011 12:43 PM, Joe Hoot wrote:
>>>> I'm trying to understand the following messages:
>>>> 
>>>>   Apr 12 13:23:52 oim60025001 tgtd: conn_close(88) connection closed 
>>>> 0x94d80c4 1
>>>>   .... lots of the above messages...
>>>>   Apr 12 13:37:19 oim60025001 tgtd: abort_task_set(979) found 271 0
>>>>   .... lots of the above messages...
>>>>   Apr 12 13:37:27 oim60025001 tgtd: abort_cmd(955) found e9 e
>>>>   .... lots of the above messages...
>>>>   Apr 12 13:39:08 oim60025001 tgtd: conn_close(88) connection closed 
>>>> 0xa9ab8ec 1
>>>> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "open-iscsi" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/open-iscsi?hl=en.
> 

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.

Re: Target or Initiator closing connection?

Reply via email to