On 08/01/2013 01:43 PM, Tracy Reed wrote: > On Wed, Jul 24, 2013 at 05:15:16PM PDT, Mike Christie spake thusly: >> Did you bring the target back up and if so did you do it with the same >> target name? > > Sorry for the delay in getting back, been travelling on business. But thanks > very much for the reply! > > Yes, I did bring the target back up and with the same name. Although some of > the LUNs have moved around as I rebuilt the machine to match its partner which > the VMs RAID 1 it against. > >> What is your replacement/recovery timeout setting in /etc/iscsi/iscsid.conf? > > Looks like 120 but just in case, here's the entire contents: > > scsid.startup = /etc/rc.d/init.d/iscsid force-start > node.startup = automatic > node.leading_login = No > node.session.timeo.replacement_timeout = 120 > node.conn[0].timeo.login_timeout = 15 > node.conn[0].timeo.logout_timeout = 15 > node.conn[0].timeo.noop_out_interval = 5 > node.conn[0].timeo.noop_out_timeout = 5 > node.session.err_timeo.abort_timeout = 15 > node.session.err_timeo.lu_reset_timeout = 30 > node.session.err_timeo.tgt_reset_timeout = 30 > node.session.initial_login_retry_max = 8 > node.session.cmds_max = 128 > node.session.queue_depth = 32 > node.session.xmit_thread_priority = -20 > node.session.iscsi.InitialR2T = No > node.session.iscsi.ImmediateData = Yes > node.session.iscsi.FirstBurstLength = 262144 > node.session.iscsi.MaxBurstLength = 16776192 > node.conn[0].iscsi.MaxRecvDataSegmentLength = 262144 > node.conn[0].iscsi.MaxXmitDataSegmentLength = 0 > discovery.sendtargets.iscsi.MaxRecvDataSegmentLength = 32768 > node.conn[0].iscsi.HeaderDigest = None > node.session.nr_sessions = 1 > node.session.iscsi.FastAbort = Yes > > See anything amiss? I now have around 8 processes stuck on this system. I'm > going to have to reboot it this weekend to clear up the issue but I would > really like to find out what is really going on and how to avoid it before > taking such measures. > >> It sounds like the scsi scan IO is stuck on a target that disappeared >> and never came back, or it is a Centos scsi layer bug. Could you send >> the /var/log/messages. > > The entire file is rather large but here are some of the messages relevant to > iscsi: > > Jul 4 15:18:44 cpu03 kernel: connection8:0: detected conn error (1020) > Jul 4 15:18:45 cpu03 iscsid: Kernel reported iSCSI connection 8:0 error > (1020 - ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) state (3) > Jul 4 15:18:46 cpu03 kernel: connection6:0: detected conn error (1020) > Jul 4 15:18:46 cpu03 kernel: connection7:0: detected conn error (1020) > Jul 4 15:18:47 cpu03 iscsid: Kernel reported iSCSI connection 6:0 error > (1020 - ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) state (3) > Jul 4 15:18:47 cpu03 iscsid: Kernel reported iSCSI connection 7:0 error > (1020 - ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) state (3) > Jul 4 15:18:47 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection > refused) > Jul 4 15:18:50 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection > refused) > Jul 4 15:18:50 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection > refused) > Jul 4 15:18:51 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection > refused) > Jul 4 15:19:27 cpu03 iscsid: connect to 10.0.1.11:3260 failed (No route to > host) > Jul 4 15:19:30 cpu03 iscsid: connect to 10.0.1.11:3260 failed (No route to > host) > Jul 4 15:19:30 cpu03 iscsid: connect to 10.0.1.11:3260 failed (No route to > host) > Jul 4 15:19:33 cpu03 iscsid: connect to 10.0.1.11:3260 failed (No route to > host) > Jul 4 15:19:36 cpu03 iscsid: connect to 10.0.1.11:3260 failed (No route to > host) > <skip many of these no route to host messages, happened while I was > rebuilding the target with ip 10.0.1.11> > Jul 4 15:18:47 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection > refused) > Jul 4 15:18:50 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection > refused) > Jul 4 15:18:50 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection > refused) > Jul 4 15:18:51 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection > refused) > Jul 4 15:20:45 cpu03 kernel: session8: session recovery timed out after 120 > secs > Jul 4 15:20:45 cpu03 iscsid: connect to 10.0.1.11:3260 failed (No route to > host) > Jul 4 15:20:47 cpu03 kernel: session6: session recovery timed out after 120 > secs > Jul 4 15:20:47 cpu03 kernel: session7: session recovery timed out after 120 > secs > Jul 8 20:37:04 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection > refused) > Jul 8 20:37:04 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection > refused) > Jul 8 20:37:07 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection > refused) > <skip lots of these connection refused messages) > Jul 12 14:33:08 cpu03 kernel: connection8:0: detected conn error (1020) > Jul 12 14:33:08 cpu03 kernel: connection6:0: detected conn error (1020) > Jul 12 14:33:08 cpu03 kernel: connection7:0: detected conn error (1020) > Jul 12 14:33:09 cpu03 iscsid: conn 0 login rejected: initiator error - target > not found (02/03) > Jul 12 14:33:09 cpu03 iscsid: conn 0 login rejected: initiator error - target > not found (02/03) > Jul 12 14:33:09 cpu03 iscsid: conn 0 login rejected: initiator error - target > not found (02/03) > Jul 12 14:33:11 cpu03 kernel: connection8:0: detected conn error (1020) > Jul 12 14:33:11 cpu03 kernel: connection6:0: detected conn error (1020) > Jul 12 14:33:11 cpu03 kernel: connection7:0: detected conn error (1020) > Jul 12 14:33:12 cpu03 iscsid: conn 0 login rejected: initiator error - target > not found (02/03) > Jul 12 14:33:12 cpu03 iscsid: conn 0 login rejected: initiator error - target > not found (02/03) > Jul 12 14:33:12 cpu03 iscsid: conn 0 login rejected: initiator error - target > not found (02/03) > Jul 12 14:33:14 cpu03 kernel: connection8:0: detected conn error (1020) > Jul 12 14:33:14 cpu03 kernel: connection6:0: detected conn error (1020) > Jul 12 14:33:14 cpu03 kernel: connection7:0: detected conn error (1020) > Jul 12 14:33:15 cpu03 iscsid: conn 0 login rejected: initiator error - target > not found (02/03) > Jul 12 14:33:15 cpu03 iscsid: conn 0 login rejected: initiator error - target > not found (02/03) > Jul 12 14:33:15 cpu03 iscsid: conn 0 login rejected: initiator error - target > not found (02/03) > Jul 12 14:33:17 cpu03 kernel: connection8:0: detected conn error (1020) > Jul 12 14:33:17 cpu03 kernel: connection6:0: detected conn error (1020) > Jul 12 14:33:17 cpu03 kernel: connection7:0: detected conn error (1020) > Jul 12 14:33:18 cpu03 iscsid: conn 0 login rejected: initiator error - target > not found (02/03) > Jul 12 14:33:18 cpu03 iscsid: conn 0 login rejected: initiator error - target > not found (02/03) > Jul 12 14:33:18 cpu03 iscsid: conn 0 login rejected: initiator error - target > not found (02/03) > Jul 12 14:33:20 cpu03 kernel: connection8:0: detected conn error (1020) > Jul 12 14:33:20 cpu03 kernel: connection6:0: detected conn error (1020) > Jul 12 14:33:20 cpu03 kernel: connection7:0: detected conn error (1020) > Jul 12 14:33:21 cpu03 iscsid: conn 0 login rejected: initiator error - target > not found (02/03) > Jul 12 14:33:21 cpu03 iscsid: conn 0 login rejected: initiator error - target > not found (02/03) > Jul 12 14:33:21 cpu03 iscsid: conn 0 login rejected: initiator error - target > not found (02/03) > Jul 12 14:33:23 cpu03 kernel: connection8:0: detected conn error (1020) > Jul 12 14:33:23 cpu03 kernel: connection6:0: detected conn error (1020) > Jul 12 14:33:23 cpu03 kernel: connection7:0: detected conn error (1020) > Jul 12 14:33:24 cpu03 iscsid: conn 0 login rejected: initiator error - target > not found (02/03) > Jul 12 14:33:24 cpu03 iscsid: conn 0 login rejected: initiator error - target > not found (02/03) > Jul 12 14:33:24 cpu03 iscsid: conn 0 login rejected: initiator error - target > not found (02/03) > Jul 12 14:33:26 cpu03 kernel: connection8:0: detected conn error (1020) > Jul 12 14:33:26 cpu03 kernel: connection6:0: detected conn error (1020) > Jul 12 14:33:26 cpu03 kernel: connection7:0: detected conn error (1020) > Jul 12 14:33:27 cpu03 iscsid: conn 0 login rejected: initiator error - target > not found (02/03) > Jul 12 14:33:27 cpu03 iscsid: conn 0 login rejected: initiator error - target > not found (02/03) > Jul 12 14:33:27 cpu03 iscsid: conn 0 login rejected: initiator error - target > not found (02/03) > Jul 12 14:33:29 cpu03 kernel: connection8:0: detected conn error (1020) > Jul 12 14:33:29 cpu03 kernel: connection6:0: detected conn error (1020) > Jul 12 14:33:29 cpu03 kernel: connection7:0: detected conn error (1020) > Jul 12 14:33:30 cpu03 iscsid: conn 0 login rejected: initiator error - target > not found (02/03) > Jul 12 14:33:30 cpu03 iscsid: conn 0 login rejected: initiator error - target > not found (02/03) > <skip lots of these> > Jul 12 14:35:23 cpu03 iscsid: conn 0 login rejected: initiator error - target > not found (02/03) > Jul 12 14:35:23 cpu03 iscsid: conn 0 login rejected: initiator error - target > not found (02/03) > Jul 12 14:35:23 cpu03 iscsid: connection6:0 is operational after recovery (9 > attempts) > Jul 12 14:35:26 cpu03 iscsid: conn 0 login rejected: initiator error - target > not found (02/03) > Jul 12 14:35:26 cpu03 iscsid: conn 0 login rejected: initiator error - target > not found (02/03) > Jul 12 14:35:29 cpu03 iscsid: connection7:0 is operational after recovery (9 > attempts) > Jul 12 14:35:29 cpu03 iscsid: connection8:0 is operational after recovery (9 > attempts) > Jul 12 15:42:57 cpu03 kernel: scsi39 : iSCSI Initiator over TCP/IP > Jul 12 15:42:57 cpu03 iscsid: Could not set session34 priority. READ/WRITE > throughout and latency could be affected. > Jul 12 15:42:58 cpu03 kernel: scsi 39:0:0:0: RAID IET > Controller 0001 PQ: 0 ANSI: 5 > Jul 12 15:42:58 cpu03 kernel: scsi 39:0:0:0: Attached scsi generic sg129 type > 12 > Jul 12 15:42:58 cpu03 kernel: scsi 39:0:0:1: Direct-Access IET > VIRTUAL-DISK 0001 PQ: 0 ANSI: 5 > Jul 12 15:42:58 cpu03 kernel: sd 39:0:0:1: Attached scsi generic sg130 type 0 > Jul 12 15:42:58 cpu03 kernel: scsi 39:0:0:2: Direct-Access IET > VIRTUAL-DISK 0001 PQ: 0 ANSI: 5 > Jul 12 15:42:58 cpu03 kernel: sd 39:0:0:2: Attached scsi generic sg131 type 0 > Jul 12 15:42:58 cpu03 iscsid: Connection34:0 to [target: > iqn.2012-04.com.edirectpublishing.disk06:6b, portal: 10.0.1.11,3260] through > [iface: default] is operational now > Jul 12 15:43:12 cpu03 kernel: scsi40 : iSCSI Initiator over TCP/IP > Jul 12 15:43:12 cpu03 kernel: scsi 40:0:0:0: RAID IET > Controller 0001 PQ: 0 ANSI: 5 > Jul 12 15:43:12 cpu03 kernel: scsi 40:0:0:0: Attached scsi generic sg132 type > 12 > Jul 12 15:43:12 cpu03 kernel: scsi 40:0:0:1: Direct-Access IET > VIRTUAL-DISK 0001 PQ: 0 ANSI: 5 > Jul 12 15:43:12 cpu03 kernel: sd 40:0:0:1: Attached scsi generic sg133 type 0 > Jul 12 15:43:12 cpu03 kernel: scsi 40:0:0:2: Direct-Access IET > VIRTUAL-DISK 0001 PQ: 0 ANSI: 5 > Jul 12 15:43:12 cpu03 kernel: sd 40:0:0:2: Attached scsi generic sg134 type 0 > Jul 12 15:43:12 cpu03 kernel: scsi 40:0:0:3: Direct-Access IET > VIRTUAL-DISK 0001 PQ: 0 ANSI: 5 > Jul 12 15:43:12 cpu03 kernel: sd 40:0:0:3: Attached scsi generic sg135 type 0 > Jul 12 15:43:12 cpu03 kernel: scsi 40:0:0:4: Direct-Access IET > VIRTUAL-DISK 0001 PQ: 0 ANSI: 5 > Jul 12 15:43:12 cpu03 kernel: sd 40:0:0:4: Attached scsi generic sg136 type 0 > Jul 12 15:43:12 cpu03 kernel: scsi 40:0:0:5: Direct-Access IET > VIRTUAL-DISK 0001 PQ: 0 ANSI: 5 > Jul 12 15:43:12 cpu03 kernel: sd 40:0:0:5: Attached scsi generic sg137 type 0 > Jul 12 15:43:12 cpu03 kernel: scsi 40:0:0:6: Direct-Access IET > VIRTUAL-DISK 0001 PQ: 0 ANSI: 5 > Jul 12 15:43:12 cpu03 kernel: sd 40:0:0:6: Attached scsi generic sg138 type 0 > Jul 12 15:43:12 cpu03 iscsid: Could not set session35 priority. READ/WRITE > throughout and latency could be affected. > Jul 12 15:43:12 cpu03 iscsid: Connection35:0 to [target: > iqn.2012-04.com.edirectpublishing.disk06:6e, portal: 10.0.1.11,3260] through > [iface: default] is operational now > Jul 12 15:43:16 cpu03 kernel: scsi41 : iSCSI Initiator over TCP/IP > Jul 12 15:43:16 cpu03 iscsid: Could not set session36 priority. READ/WRITE > throughout and latency could be affected. > Jul 12 15:43:17 cpu03 kernel: scsi 41:0:0:0: RAID IET > Controller 0001 PQ: 0 ANSI: 5 > Jul 12 15:43:17 cpu03 kernel: scsi 41:0:0:0: Attached scsi generic sg139 type > 12 > Jul 12 15:43:17 cpu03 kernel: scsi 41:0:0:1: Direct-Access IET > VIRTUAL-DISK 0001 PQ: 0 ANSI: 5 > Jul 12 15:43:17 cpu03 kernel: sd 41:0:0:1: Attached scsi generic sg140 type 0 > Jul 12 15:43:17 cpu03 kernel: scsi 41:0:0:2: Direct-Access IET > VIRTUAL-DISK 0001 PQ: 0 ANSI: 5 > Jul 12 15:43:17 cpu03 kernel: sd 41:0:0:2: Attached scsi generic sg141 type 0 > Jul 12 15:43:17 cpu03 iscsid: Connection36:0 to [target: > iqn.2012-04.com.edirectpublishing.disk06:6f, portal: 10.0.1.11,3260] through > [iface: default] is operational now > Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] Unit Not Ready > Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] Sense Key : Illegal > Request [current] > Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] Add. Sense: Logical unit > not supported > Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] READ CAPACITY(16) failed > Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] Result: hostbyte=DID_OK > driverbyte=DRIVER_SENSE > Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] Sense Key : Illegal > Request [current] > Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] Add. Sense: Logical unit > not supported > Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] READ CAPACITY failed > Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] Result: hostbyte=DID_OK > driverbyte=DRIVER_SENSE > Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] Sense Key : Illegal > Request [current] > Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] Add. Sense: Logical unit > not supported > Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] Test WP failed, assume > Write Enabled > Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] Asking for cache data failed > Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] Assuming drive cache: write > through > Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] Unit Not Ready > Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] Sense Key : Illegal > Request [current] > Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] Add. Sense: Logical unit > not supported > Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] READ CAPACITY(16) failed > Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] Result: hostbyte=DID_OK > driverbyte=DRIVER_SENSE > Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] Sense Key : Illegal > Request [current] > Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] Add. Sense: Logical unit > not supported > Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] READ CAPACITY failed > Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] Result: hostbyte=DID_OK > driverbyte=DRIVER_SENSE > Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] Sense Key : Illegal > Request [current] > Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] Add. Sense: Logical unit > not supported > Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] Test WP failed, assume > Write Enabled > Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] Asking for cache data failed > Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] Assuming drive cache: write > through > Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] Unit Not Ready > Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] Sense Key : Illegal > Request [current] > Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] Add. Sense: Logical unit > not supported > Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] READ CAPACITY(16) failed > Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] Result: hostbyte=DID_OK > driverbyte=DRIVER_SENSE > Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] Sense Key : Illegal > Request [current] > Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] Add. Sense: Logical unit > not supported > Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] READ CAPACITY failed > Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] Result: hostbyte=DID_OK > driverbyte=DRIVER_SENSE > Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] Sense Key : Illegal > Request [current] > Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] Add. Sense: Logical unit > not supported > Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] Test WP failed, assume > Write Enabled > Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] Asking for cache data failed > Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] Assuming drive cache: write > through > Jul 14 23:10:39 cpu03 kernel: sd 18:0:0:2: [sdan] Unit Not Ready > Jul 14 23:10:39 cpu03 kernel: sd 18:0:0:2: [sdan] Sense Key : Illegal > Request [current] > Jul 14 23:10:39 cpu03 kernel: sd 18:0:0:2: [sdan] Add. Sense: Logical unit > not supported > Jul 14 23:10:39 cpu03 kernel: sd 18:0:0:2: [sdan] READ CAPACITY(16) failed > Jul 14 23:10:39 cpu03 kernel: sd 18:0:0:2: [sdan] Result: hostbyte=DID_OK > driverbyte=DRIVER_SENSE > <skip lots of these> > Jul 24 16:51:30 cpu03 kernel: scsi 24:0:0:5: Direct-Access IET > VIRTUAL-DISK 0001 PQ: 0 ANSI: 5 > Jul 24 16:51:30 cpu03 kernel: sd 24:0:0:5: Attached scsi generic sg142 type 0 > Jul 24 16:51:30 cpu03 kernel: scsi 29:0:0:5: Direct-Access IET > VIRTUAL-DISK 0001 PQ: 0 ANSI: 5 > Jul 24 16:51:30 cpu03 kernel: sd 29:0:0:5: Attached scsi generic sg143 type 0 >
I am not sure. The problem is that the code in Centos is older and I do not know what other patches they have put in there. Did you delete the luns from userspace at some point? Run the scsi rescan bug script or write to the remote/delete sysfs file or did you run the iscsi logout command? Did you do some login command after you did your management stuff? It looks like you created new sessions but the old ones were left there? Are the /dev/sdX nodes for the old sessions still there? What does iscsiadm -m session -P 3 report? It looks like you might be hitting a bug in older kernels where when devices/sessions are removed the scsi layer could cause hangs. The iscsi rescan command then gets caught on that. -- You received this message because you are subscribed to the Google Groups "open-iscsi" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/open-iscsi. For more options, visit https://groups.google.com/groups/opt_out.
