Mike, I'm still digging into this "iSCSI Disconnect" issue that we've been dealing with for about two years here in SUNY. The iscsi-initiator that I'm using is this:
iscsi-initiator-utils-6.2.0.872-6.0.2.el5 I've been running dt tests to the equallogic in our Oracle VM environment with the following set: echo 1 > /sys/module/libiscsi2/parameters/debug_libiscsi echo 1 > /sys/module/libiscsi_tcp/parameters/debug_libiscsi_tcp echo 1 > /sys/module/iscsi_tcp/parameters/debug_iscsi_tcp When I try to look for "echo 1 > /sys/module/libiscsi/parameters/debug_libiscsi_eh" I don't see that as an available option. Are the debug lines above correct? Are there other debug messages that I should be gathering? [root@oim61024001 src]# ls -l /sys/module/libiscsi libiscsi2/ libiscsi_tcp/ We have done numerous tests and different hardware. I have eliminated OVM and have been testing this now on OEL5.6. The scenario that seems to break connections quickly is the following: - cluster of 5 nodes using OCFS2 (I will be trying to rule OCFS2 and dm-multipath out shortly) - 5-7 iSCSI volumes connected to each of those 5 nodes - 4 threads of dt running against each of 5-7 volumes from each host = 28 threads of dt slamming the volumes per host = 140 threads of dt per cluster. - test only lasts about 2-5 minutes before I start seeing ping timeouts and disconnects. The issue that we've seen thus far is mainly with EqualLogic and open-iscsi. From what EQL is telling us, the initiator is "aborting" the connection. But from the initiator-side, we just see "ping timeout" messages (and then the connection eventually goes away). We recently saw a thread (Apr 4) regarding cfq scheduler. So we quickly tested noop and deadline, just to see if that would change anything-- it didn't. So my most recent test was to try out a different target, just to see if we could rule out the EqualLogic. Each time I changed from EQLX to the tgtd, I would reset (and rescan in my volumes) the iscsid.conf's "FastAbort = No", or yes (if I was testing tgtd), to conform with EQLX's best practices. So at this point, after I get dm-multipath and OCFS2 out of the equation, it will be down to a tartget + kernel/initiator + I/O scheduler and I want to make sure that I'm getting all the debug information that I might need to analyze what is going on. Are there any other debug tunables that you might recommend adding to my script? On Apr 14, 2011, at 12:02 AM, Mike Christie wrote: > On 04/12/2011 12:43 PM, Joe Hoot wrote: >> I'm trying to understand the following messages: >> >> Apr 12 13:23:52 oim60025001 tgtd: conn_close(88) connection closed >> 0x94d80c4 1 >> .... lots of the above messages... >> Apr 12 13:37:19 oim60025001 tgtd: abort_task_set(979) found 271 0 >> .... lots of the above messages... >> Apr 12 13:37:27 oim60025001 tgtd: abort_cmd(955) found e9 e >> .... lots of the above messages... >> Apr 12 13:39:08 oim60025001 tgtd: conn_close(88) connection closed >> 0xa9ab8ec 1 >> >> Does that typically mean that the target has closed the connection or the >> initiator? >> > > Might be best to ask the tgtd list, but I think due to the abort > messsages it is probably the initiator if you are using open-iscsi for > the initiator and if you see the abort messages before conn_close ones. > > If you see abort ones first, then see conn close ones, then probably a > scsi command is timing out. This causes the scsi layer to have the > initiator abort the command. If the abort fails, the initiator could try > a lun reset or target reset. If we cannot reset or abort the problem > away we drop the connection/session. > > On the initiator if you did > > echo 1 > /sys/module/libiscsi/parameters/debug_libiscsi_eh > > you would see a "wait for relogin" message in /var/log/messages then a > "session reset succeeded" or "failing session reset: Could not log back > into" message. The wait for relogin would match the conn close messages > on the target. Then the success or failed messages indicated if we were > able to relogin. > > -- > You received this message because you are subscribed to the Google Groups > "open-iscsi" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/open-iscsi?hl=en. > -- You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
