sorry... I'm confusing the issue...   the messages that I showed below were 
from a iSCSI Disconnect test that we had done while connecting to the 
EqualLogic.  I will re-setup my test tot he tgtd to see what it and the 
initiator show.  I will take you suggestion of writting to dev/null.



On Apr 14, 2011, at 7:39 AM, Joe Hoot wrote:

> Here's what I am seeing (roughly... with lines scraped in between.. and I'm 
> only grepping for connect7/session7):
> 
> Apr 11 13:22:24 oim61024009 kernel:  connection7:0: iscsi_tcp_segment_done 
> copied 2592 0 size 4096 xmit
> ...lots of the above types of messages...
> Apr 11 13:22:24 oim61024009 kernel:  connection7:0: iscsi_tcp_segment_done 
> copied 2592 0 size 4096 xmit
> Apr 11 13:22:24 oim61024009 kernel:  connection7:0: 
> iscsi_check_transport_timeouts Setting next tmo 2279563
> Apr 11 13:22:24 oim61024009 kernel:  connection7:0: iscsi_tcp_segment_done 
> copied 2592 0 size 4096 xmit
> ...lots of the above types of messages...
> Apr 11 13:22:24 oim61024009 kernel:  connection7:0: iscsi_tcp_segment_done 
> copied 2592 0 size 4096 xmit
> Apr 11 13:22:24 oim61024009 kernel:  connection7:0: 
> iscsi_check_transport_timeouts Setting next tmo 2279563
> Apr 11 13:22:24 oim61024009 kernel:  connection7:0: iscsi_tcp_segment_done 
> copied 2592 0 size 4096 xmit
> ...lots of the above types of messages...
> Apr 11 13:22:25 oim61024009 kernel:  connection7:0: iscsi_tcp_segment_done 
> copied 2592 0 size 4096 xmit
> Apr 11 13:22:25 oim61024009 kernel:  connection7:0: 
> iscsi_check_transport_timeouts Sending nopout as ping
> Apr 11 13:22:25 oim61024009 kernel:  connection7:0: 
> iscsi_check_transport_timeouts Setting next tmo 2280813
> Apr 11 13:22:25 oim61024009 kernel:  connection7:0: iscsi_tcp_segment_done 
> copied 2592 0 size 4096 xmit
> ...lots of the above types of messages...
> Apr 11 13:22:30 oim61024009 kernel:  connection7:0: iscsi_tcp_segment_done 
> copied 2592 0 size 4096 xmit
> Apr 11 13:22:30 oim61024009 kernel:  connection7:0: ping timeout of 5 secs 
> expired, recv timeout 5, last rx 2278313, last ping 2279563, now 2280813
> Apr 11 13:22:30 oim61024009 kernel:  connection7:0: iscsi_tcp_segment_done 
> copied 2592 0 size 4096 xmit
> Apr 11 13:22:30 oim61024009 kernel:  connection7:0: detected conn error (1011)
> Apr 11 13:22:30 oim61024009 kernel:  session7: iscsi_data_xmit Tx suspended!
> Apr 11 13:22:30 oim61024009 kernel:  session7: iscsi_start_session_recovery 
> blocking session
> Apr 11 13:22:30 oim61024009 kernel:  session7: fail_all_commands failing 
> pending sc df493a80 itt 0x108
> ...etc..etc..
> Apr 11 13:22:30 oim61024009 kernel:  session7: fail_all_commands failing 
> pending sc d81de300 itt 0x159
> Apr 11 13:22:30 oim61024009 kernel:  session7: fail_all_commands failing 
> requeued sc d89b6800 itt 0xd8
> Apr 11 13:22:30 oim61024009 kernel:  connection7:0: iscsi_tcp_cleanup_task 
> pending r2t dropped
> Apr 11 13:22:30 oim61024009 kernel:  session7: fail_all_commands failing 
> requeued sc ca305ac0 itt 0xd9
> Apr 11 13:22:30 oim61024009 kernel:  connection7:0: iscsi_tcp_cleanup_task 
> pending r2t dropped
> Apr 11 13:22:30 oim61024009 kernel:  session7: fail_all_commands failing 
> requeued sc d81de440 itt 0xdb
> Apr 11 13:22:30 oim61024009 kernel:  connection7:0: iscsi_tcp_cleanup_task 
> pending r2t dropped
> Apr 11 13:22:30 oim61024009 kernel:  session7: fail_all_commands failing 
> requeued sc ca305700 itt 0xd7
> ...pattern continues..
> Apr 11 13:22:30 oim61024009 kernel:  session7: fail_all_commands failing 
> requeued sc d847ee40 itt 0xfa
> Apr 11 13:22:30 oim61024009 kernel:  connection7:0: iscsi_tcp_cleanup_task 
> pending r2t dropped
> Apr 11 13:22:30 oim61024009 kernel:  session7: fail_all_commands failing in 
> progress sc c9ac8840 itt 0xfb
> Apr 11 13:22:30 oim61024009 kernel:  session7: fail_all_commands failing in 
> progress sc d1e04800 itt 0xfc
> Apr 11 13:22:30 oim61024009 kernel:  session7: fail_all_commands failing in 
> progress sc df0cdc00 itt 0xfd
> ...pattern continues...
> Apr 11 13:22:30 oim61024009 kernel:  session7: fail_all_commands failing in 
> progress sc d8811700 itt 0x107
> Apr 11 13:22:30 oim61024009 kernel:  session7: flush_control_queues flushing 
> pending mgmt task itt 0x15a
> Apr 11 13:22:30 oim61024009 kernel:  session7: iscsi2_queuecommand cmd 0x2a 
> rejected (7)
> Apr 11 13:22:32 oim61024009 kernel:  connection7:0: iscsi_tcp_hdr_recv_prep 
> (digest disabled)
> Apr 11 13:22:32 oim61024009 kernel:  connection7:0: iscsi_tcp_task_init mtask 
> deq [itt 0x0]
> Apr 11 13:22:32 oim61024009 kernel:  connection7:0: 
> iscsi_sw_tcp_send_hdr_prep digest disabled
> Apr 11 13:22:32 oim61024009 kernel:  connection7:0: 
> iscsi_sw_tcp_send_linear_data_prep datalen=208 digest disabled
> Apr 11 13:22:32 oim61024009 kernel:  session7: iscsi_prep_mgmt_task mgmtpdu 
> [op 0x3 hdr->itt 0x10000000 datalen 208]
> Apr 11 13:22:32 oim61024009 kernel:  connection7:0: iscsi_tcp_segment_done 
> copied 0 0 size 48 xmit
> 
> 
> 
> 
> 
> 
> On Apr 14, 2011, at 6:45 AM, Mike Christie wrote:
> 
>> On 04/14/2011 05:12 AM, Joe Hoot wrote:
>>> Mike,
>>> 
>>> I'm still digging into this "iSCSI Disconnect" issue that we've been 
>>> dealing with for about two years here in SUNY.  The iscsi-initiator that 
>>> I'm using is this:
>>> 
>>>  iscsi-initiator-utils-6.2.0.872-6.0.2.el5
>>> 
>>> I've been running dt tests to the equallogic in our Oracle VM environment 
>>> with the following set:
>>> 
>>>  echo 1>  /sys/module/libiscsi2/parameters/debug_libiscsi
>>>  echo 1>  /sys/module/libiscsi_tcp/parameters/debug_libiscsi_tcp
>>>  echo 1>  /sys/module/iscsi_tcp/parameters/debug_iscsi_tcp
>>> 
>>> When I try to look for "echo 1>  
>>> /sys/module/libiscsi/parameters/debug_libiscsi_eh" I don't see that as an 
>>> available option.  Are the debug lines above correct?  Are there other 
>>> debug messages that I should be gathering?
>>> 
>> 
>> It is only in newer kernels. debug_libiscsi will print out what
>> debug_libiscsi_eh does plus more.
>> 
>> 
>>> [root@oim61024001 src]# ls -l /sys/module/libiscsi
>>> libiscsi2/    libiscsi_tcp/
>>> 
>>> 
>>> We have done numerous tests and different hardware.  I have eliminated OVM 
>>> and have been testing this now on OEL5.6.
>>> 
>>> The scenario that seems to break connections quickly is the following:
>>> 
>>> - cluster of 5 nodes using OCFS2 (I will be trying to rule OCFS2 and 
>>> dm-multipath out shortly)
>>> - 5-7 iSCSI volumes connected to each of those 5 nodes
>>> - 4 threads of dt running against each of 5-7 volumes from each host = 28 
>>> threads of dt slamming the volumes per host = 140 threads of dt per cluster.
>>> - test only lasts about 2-5 minutes before I start seeing ping timeouts and 
>>> disconnects.
>>> 
>>> The issue that we've seen thus far is mainly with EqualLogic and 
>>> open-iscsi.   From what EQL is telling us, the initiator is "aborting" the 
>>> connection.  But from the initiator-side, we just see "ping timeout" 
>>> messages (and then the connection eventually goes away).
>>> 
>> 
>> When the ping times out, the initiator will abort/drop the connection.
>> It will then try to relogin. So you should probably be seeing the
>> ping/nop timeout message, then a conn error 1011, then a message about
>> reconnected in X retries, right?
>> 
>> 
>> The thing is that below with tgtd we seem to be seeing scsi commands
>> timing out. On the initiator logs did you see ping/nop timeout messages
>> when you hit the problem below with tgtd?
>> 
>> With tgtd let's take take the disks out of it and just test iscsi.
>> 
>> So do something like this:
>> 
>> tgtadm --op new --mode target --tid 1 -T iqn.2001-04.com.pertest
>> tgtadm --op new --mode logicalunit --tid 1 --lun 1 --bstype null -b
>> /dev/null
>> tgtadm --op bind --mode target --tid 1 -I ALL
>> 
>> This of course will not do any real IO so you cannot do tests that
>> verify day. Just do really harsh heavy read/write tests.
>> 
>> 
>> When you do IO to fake disks do you see any ping/nop or scsi timeout errors?
>> 
>> 
>>> We recently saw a thread (Apr 4) regarding cfq scheduler.  So we quickly 
>>> tested noop and deadline, just to see if that would change anything-- it 
>>> didn't.
>>> 
>>> So my most recent test was to try out a different target, just to see if we 
>>> could rule out the EqualLogic.  Each time I changed from EQLX to the tgtd, 
>>> I would reset (and rescan in my volumes) the iscsid.conf's "FastAbort = 
>>> No", or yes (if I was testing tgtd), to conform with EQLX's best practices.
>>> 
>> 
>> The abort setting is not going to help at all for any target. That
>> basically kicks in after you have already hit the problem you are
>> hitting. It just speeds up the handling of the problem.
>> 
>> 
>>> So at this point, after I get dm-multipath and OCFS2 out of the equation, 
>>> it will be down to a tartget + kernel/initiator + I/O scheduler and I want 
>>> to make sure that I'm getting all the debug information that I might need 
>>> to analyze what is going on.
>>> 
>>> Are there any other debug tunables that you might recommend adding to my 
>>> script?
>> 
>> No, the ones you have above are all there is. Maybe tcpdump -w iscsi.out
>> -i ethXYZ
>> 
>> Did you have the initiator /var/log/messages with debugging on when you
>> hit the problem below? If so send them.
>> 
>>> 
>>> On Apr 14, 2011, at 12:02 AM, Mike Christie wrote:
>>> 
>>>> On 04/12/2011 12:43 PM, Joe Hoot wrote:
>>>>> I'm trying to understand the following messages:
>>>>> 
>>>>>  Apr 12 13:23:52 oim60025001 tgtd: conn_close(88) connection closed 
>>>>> 0x94d80c4 1
>>>>>  .... lots of the above messages...
>>>>>  Apr 12 13:37:19 oim60025001 tgtd: abort_task_set(979) found 271 0
>>>>>  .... lots of the above messages...
>>>>>  Apr 12 13:37:27 oim60025001 tgtd: abort_cmd(955) found e9 e
>>>>>  .... lots of the above messages...
>>>>>  Apr 12 13:39:08 oim60025001 tgtd: conn_close(88) connection closed 
>>>>> 0xa9ab8ec 1
>>>>> 
>> 
>> --
>> You received this message because you are subscribed to the Google Groups 
>> "open-iscsi" group.
>> To post to this group, send email to [email protected].
>> To unsubscribe from this group, send email to 
>> [email protected].
>> For more options, visit this group at 
>> http://groups.google.com/group/open-iscsi?hl=en.
>> 
> 
> --
> You received this message because you are subscribed to the Google Groups 
> "open-iscsi" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/open-iscsi?hl=en.
> 

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.

Reply via email to