One of our DRBD clusters has 47 LUN's being published.
We're using RHEL 6.4.  Here are the various package versions
being used:
---
pacemaker-1.1.7-6.el6.x86_64
corosync-1.4.1-7.el6.x86_64
resource-agents-3.9.2-12.el6.x86_64
scsi-target-utils-1.0.24-2.el6.x86_64

        Somewhere after 40 LUN's we started experiencing monitor
failures of the most recent LUN's added to the cluster.  Things
like:
---
Jul 26 23:47:39 [8557] stor01a       crmd:     info: process_lrm_event:         
LRM operation lun47_monitor_10000 (call=357, rc=7, cib-update=6790, 
confirmed=false) not running
Jul 26 23:47:39 [8557] stor01a       crmd:     info: process_graph_event:       
Detected action lun47_monitor_10000 from a different transition: 5737 vs. 5793
Jul 26 23:47:39 [8557] stor01a       crmd:     info: abort_transition_graph:    
process_graph_event:476 - Triggered transition abort (complete=1, 
tag=lrm_rsc_op, id=lun47_last_failure_0, 
magic=0:7;192:5737:0:e16c8e9d-87ed-4132-a3b2-724a30b6cc73, cib=0.111.47) : Old 
event
Jul 26 23:47:39 [8557] stor01a       crmd:  warning: update_failcount:  
Updating failcount for lun47 on stor01a after failed monitor: rc=7 
(update=value++, time=1374900459)
Jul 26 23:47:39 [8555] stor01a      attrd:   notice: attrd_trigger_update:      
Sending flush op to all hosts for: fail-count-lun47 (1)
Jul 26 23:47:39 [8557] stor01a       crmd:   notice: do_state_transition:       
State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC 
cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Jul 26 23:47:39 [8555] stor01a      attrd:   notice: attrd_perform_update:      
Sent update 438: fail-count-lun47=1
Jul 26 23:47:39 [8555] stor01a      attrd:   notice: attrd_trigger_update:      
Sending flush op to all hosts for: last-failure-lun47 (1374900459)
Jul 26 23:47:39 [8557] stor01a       crmd:     info: abort_transition_graph:    
te_update_diff:176 - Triggered transition abort (complete=1, tag=nvpair, 
id=status-stor01a-fail-count-lun47, name=fail-count-lun47, value=1, magic=NA, 
cib=0.111.48) : Transient attribute: update
Jul 26 23:47:39 [8555] stor01a      attrd:   notice: attrd_perform_update:      
Sent update 441: last-failure-lun47=1374900459
---

        So I decided to modify the resource agent as follows:
---
--- iSCSILogicalUnit.orig       2013-08-05 12:15:03.185879119 -0500
+++ iSCSILogicalUnit    2013-08-01 11:31:24.768133374 -0500
@@ -305,12 +305,28 @@
            if [ -z "$TID" ]; then
                # Our target is not configured, thus we're not
                # running.
+               echo "$(date) TID not found: ${TID}." >> /var/log/iscsi-ra.log
                return $OCF_NOT_RUNNING
            fi
            # This only looks for the backing store, but does not test
            # for the correct target ID and LUN.
-           tgtadm --lld iscsi --op show --mode target \
+           tgt_output=$(tgtadm --lld iscsi --op show --mode target)
+           echo "$tgt_output" \
                | grep -E -q "[[:space:]]+Backing store.*: ${OCF_RESKEY_path}" 
&& return $OCF_SUCCESS
+           echo "$(date) first LUN failure: ${OCF_RESKEY_path}" >> 
/var/log/iscsi-ra.log
+           echo "$tgt_output" >> /var/log/iscsi-ra.log
+           sleep 1
+           tgt_output=$(tgtadm --lld iscsi --op show --mode target)
+           echo "$tgt_output" \
+               | grep -E -q "[[:space:]]+Backing store.*: ${OCF_RESKEY_path}" 
&& return $OCF_SUCCESS
+           echo "$(date) second LUN failure: ${OCF_RESKEY_path}" >> 
/var/log/iscsi-ra.log
+           echo "$tgt_output" >> /var/log/iscsi-ra.log
+           sleep 1
+           tgt_output=$(tgtadm --lld iscsi --op show --mode target)
+           echo "$tgt_output" \
+               | grep -E -q "[[:space:]]+Backing store.*: ${OCF_RESKEY_path}" 
&& return $OCF_SUCCESS
+           echo "$(date) third LUN failure: ${OCF_RESKEY_path}" >> 
/var/log/iscsi-ra.log
+           echo "$tgt_output" >> /var/log/iscsi-ra.log
            ;;
        lio)
            
configfs_path="/sys/kernel/config/target/iscsi/${OCF_RESKEY_target_iqn}/tpgt_1/lun/lun_${OCF_RESKEY_lun}/${OCF_RESOURCE_INSTANCE}/udev_path"
---

        And over the weekend I got a hit from this.  But it only
failed the first time.  The output from iscsi-ra.log:
---
Sun Aug  4 10:54:41 CDT 2013 first LUN failure: /dev/stor01/vm-www01
Target 1: iqn.2013-04.net.bitgnome:vh-storage01
    System information:
        Driver: iscsi
        State: ready
    I_T nexus information:
        I_T nexus: 17
            Initiator: iqn.1994-05.com.redhat:b8998f3aaa11
            Connection: 0
                IP Address: 172.16.165.18
        I_T nexus: 18
            Initiator: iqn.1994-05.com.redhat:36ad8852a96d
            Connection: 0
                IP Address: 172.16.165.19
        I_T nexus: 19
            Initiator: iqn.1994-05.com.redhat:28d6b194ab
            Connection: 0
                IP Address: 172.16.165.20
        I_T nexus: 20
            Initiator: iqn.1994-05.com.redhat:bc9afc47c4
            Connection: 0
                IP Address: 172.16.165.21
    LUN information:
        LUN: 0
            Type: controller
            SCSI ID: IET     00010000
            SCSI SN: beaf10
            Size: 0 MB, Block size: 1
            Online: Yes
            Removable media: No
            Prevent removal: No
            Readonly: No
            Backing store type: null
            Backing store path: None
            Backing store flags: 
        LUN: 1
            Type: disk
            SCSI ID: lun1
            SCSI SN: (stdin)=
            Size: 10737 MB, Block size: 512
            Online: Yes
            Removable media: No
            Prevent removal: No
            Readonly: No
            Backing store type: rdwr
            Backing store path: /dev/stor01/vm-ldap1
            Backing store flags: 
        LUN: 2
            Type: disk
            SCSI ID: lun2
            SCSI SN: (stdin)=
            Size: 10737 MB, Block size: 512
            Online: Yes
            Removable media: No
            Prevent removal: No
            Readonly: No
            Backing store type: rdwr
            Backing store path: /dev/stor01/vm-arcgis
            Backing store flags: 
        LUN: 3
            Type: disk
            SCSI ID: lun3
            SCSI SN: (stdin)=
            Size: 10737 MB, Block size: 512
            Online: Yes
            Removable media: No
            Prevent removal: No
            Readonly: No
            Backing store type: rdwr
            Backing store path: /dev/stor01/vm-mail1
            Backing store flags: 
        LUN: 4
            Type: disk
            SCSI ID: lun4
            SCSI SN: (stdin)=
            Size: 10737 MB, Block size: 512
            Online: Yes
            Removable media: No
            Prevent removal: No
            Readonly: No
            Backing store type: rdwr
            Backing store path: /dev/stor01/vm-mail2
            Backing store flags: 
        LUN: 5
            Type: disk
            SCSI ID: lun5
            SCSI SN: (stdin)=
            Size: 10737 MB, Block size: 512
            Online: Yes
            Removable media: No
            Prevent removal: No
            Readonly: No
            Backing store type: rdwr
            Backing store path: /dev/stor01/vm-wp2
            Backing store flags: 
        LUN: 6
            Type: disk
            SCSI ID: lun6
            SCSI SN: (stdin)=
            Size: 10737 MB, Block size: 512
            Online: Yes
            Removable media: No
            Prevent removal: No
            Readonly: No
            Backing store type: rdwr
            Backing store path: /dev/stor01/vm-ldap-slave1
            Backing store flags: 
        LUN: 7
            Type: disk
            SCSI ID: lun7
            SCSI SN: (stdin)=
            Size: 10737 MB, Block size: 512
            Online: Yes
            Removable media: No
            Prevent removal: No
            Readonly: No
            Backing store type: rdwr
            Backing store path: /dev/stor01/vm-ldap-slave2
            Backing store flags: 
        LUN: 8
            Type: disk
            SCSI ID: lun8
            SCSI SN: (stdin)=
            Size: 10737 MB, Block size: 512
            Online: Yes
            Removable media: No
            Prevent removal: No
            Readonly: No
            Backing store type: rdwr
            Backing store path: /dev/stor01/vm-ldap-slave3
            Backing store flags: 
        LUN: 9
            Type: disk
            SCSI ID: lun9
            SCSI SN: (stdin)=
            Size: 10737 MB, Block size: 512
            Online: Yes
            Removable media: No
            Prevent removal: No
            Readonly: No
            Backing store type: rdwr
            Backing store path: /dev/stor01/vm-wp1
            Backing store flags: 
        LUN: 10
            Type: disk
            SCSI ID: lun10
            SCSI SN: (stdin)=
            Size: 10737 MB, Block size: 512
            Online: Yes
            Removable media: No
            Prevent removal: No
            Readonly: No
            Backing store type: rdwr
            Backing store path: /dev/stor01/vm-support
            Backing store flags: 
        LUN: 11
            Type: disk
            SCSI ID: lun11
            SCSI SN: (stdin)=
            Size: 10737 MB, Block size: 512
            Online: Yes
            Removable media: No
            Prevent removal: No
            Readonly: No
            Backing store type: rdwr
            Backing store path: /dev/stor01/vm-cache1
            Backing store flags: 
        LUN: 12
            Type: disk
            SCSI ID: lun12
            SCSI SN: (stdin)=
            Size: 10737 MB, Block size: 512
            Online: Yes
            Removable media: No
            Prevent removal: No
            Readonly: No
            Backing store type: rdwr
            Backing store path: /dev/stor01/vm-cache2
            Backing store flags: 
        LUN: 13
            Type: disk
            SCSI ID: lun13
            SCSI SN: (stdin)=
            Size: 10737 MB, Block size: 512
            Online: Yes
            Removable media: No
            Prevent removal: No
            Readonly: No
            Backing store type: rdwr
            Backing store path: /dev/stor01/vm-proxy
            Backing store flags: 
        LUN: 14
            Type: disk
            SCSI ID: lun14
            SCSI SN: (stdin)=
            Size: 53687 MB, Block size: 512
            Online: Yes
            Removable media: No
            Prevent removal: No
            Readonly: No
            Backing store type: rdwr
            Backing store path: /dev/stor01/vm-pcspine
            Backing store flags: 
        LUN: 15
            Type: disk
            SCSI ID: lun15
            SCSI SN: (stdin)=
            Size: 53687 MB, Block size: 512
            Online: Yes
            Removable media: No
            Prevent removal: No
            Readonly: No
            Backing store type: rdwr
            Backing store path: /dev/stor01/vm-print
            Backing store flags: 
        LUN: 16
            Type: disk
            SCSI ID: lun16
            SCSI SN: (stdin)=
            Size: 53687 MB, Block size: 512
            Online: Yes
            Removable media: No
            Prevent removal: No
            Readonly: No
            Backing store type: rdwr
            Backing store path: /dev/stor01/vm-ad
            Backing store flags: 
        LUN: 17
            Type: disk
            SCSI ID: lun17
            SCSI SN: (stdin)=
            Size: 53687 MB, Block size: 512
            Online: Yes
            Removable media: No
            Prevent removal: No
            Readonly: No
            Backing store type: rdwr
            Backing store path: /dev/stor01/vm-pcbrain
            Backing store flags: 
        LUN: 18
            Type: disk
            SCSI ID: lun18
            SCSI SN: (stdin)=
            Size: 10737 MB, Block size: 512
            Online: Yes
            Removable media: No
            Prevent removal: No
            Readonly: No
            Backing store type: rdwr
            Backing store path: /dev/stor01/vm-xmpp
            Backing store flags: 
        LUN: 19
            Type: disk
            SCSI ID: lun19
            SCSI SN: (stdin)=
            Size: 10737 MB, Block size: 512
            Online: Yes
            Removable media: No
            Prevent removal: No
            Readonly: No
            Backing store type: rdwr
            Backing store path: /dev/stor01/vm-pma
            Backing store flags: 
        LUN: 20
            Type: disk
            SCSI ID: lun20
            SCSI SN: (stdin)=
            Size: 10737 MB, Block size: 512
            Online: Yes
            Removable media: No
            Prevent removal: No
            Readonly: No
            Backing store type: rdwr
            Backing store path: /dev/stor01/vm-cake
            Backing store flags: 
        LUN: 21
            Type: disk
            SCSI ID: lun21
            SCSI SN: (stdin)=
            Size: 10737 MB, Block size: 512
            Online: Yes
            Removable media: No
            Prevent removal: No
            Readonly: No
            Backing store type: rdwr
            Backing store path: /dev/stor01/vm-ica-file
            Backing store flags: 
        LUN: 22
            Type: disk
            SCSI ID: lun22
            SCSI SN: (stdin)=
            Size: 10737 MB, Block size: 512
            Online: Yes
            Removable media: No
            Prevent removal: No
            Readonly: No
            Backing store type: rdwr
            Backing store path: /dev/stor01/vm-liwc
            Backing store flags: 
        LUN: 23
            Type: disk
            SCSI ID: lun23
            SCSI SN: (stdin)=
            Size: 10737 MB, Block size: 512
            Online: Yes
            Removable media: No
            Prevent removal: No
            Readonly: No
            Backing store type: rdwr
            Backing store path: /dev/stor01/vm-lasso
            Backing store flags: 
        LUN: 24
            Type: disk
            SCSI ID: lun24
            SCSI SN: (stdin)=
            Size: 10737 MB, Block size: 512
            Online: Yes
            Removable media: No
            Prevent removal: No
            Readonly: No
            Backing store type: rdwr
            Backing store path: /dev/stor01/vm-qt
            Backing store flags: 
        LUN: 25
            Type: disk
            SCSI ID: lun25
            SCSI SN: (stdin)=
            Size: 10737 MB, Block size: 512
            Online: Yes
            Removable media: No
            Prevent removal: No
            Readonly: No
            Backing store type: rdwr
            Backing store path: /dev/stor01/vm-public
            Backing store flags: 
        LUN: 26
            Type: disk
            SCSI ID: lun26
            SCSI SN: (stdin)=
            Size: 10737 MB, Block size: 512
            Online: Yes
            Removable media: No
            Prevent removal: No
            Readonly: No
            Backing store type: rdwr
            Backing store path: /dev/stor01/vm-source
            Backing store flags: 
        LUN: 27
            Type: disk
            SCSI ID: lun27
            SCSI SN: (stdin)=
            Size: 10737 MB, Block size: 512
            Online: Yes
            Removable media: No
            Prevent removal: No
            Readonly: No
            Backing store type: rdwr
            Backing store path: /dev/stor01/vm-gmc
            Backing store flags: 
        LUN: 28
            Type: disk
            SCSI ID: lun28
            SCSI SN: (stdin)=
            Size: 21475 MB, Block size: 512
            Online: Yes
            Removable media: No
            Prevent removal: No
            Readonly: No
            Backing store type: rdwr
            Backing store path: /dev/stor01/vm-solr
            Backing store flags: 
        LUN: 29
            Type: disk
            SCSI ID: lun29
            SCSI SN: (stdin)=
            Size: 10737 MB, Block size: 512
            Online: Yes
            Removable media: No
            Prevent removal: No
            Readonly: No
            Backing store type: rdwr
            Backing store path: /dev/stor01/vm-license
            Backing store flags: 
        LUN: 30
            Type: disk
            SCSI ID: lun30
            SCSI SN: (stdin)=
            Size: 10737 MB, Block size: 512
            Online: Yes
            Removable media: No
            Prevent removal: No
            Readonly: No
            Backing store type: rdwr
            Backing store path: /dev/stor01/vm-media
            Backing store flags: 
        LUN: 31
            Type: disk
            SCSI ID: lun31
            SCSI SN: (stdin)=
            Size: 10737 MB, Block size: 512
            Online: Yes
            Removable media: No
            Prevent removal: No
            Readonly: No
            Backing store type: rdwr
            Backing store path: /dev/stor01/vm-opera
            Backing store flags: 
        LUN: 32
            Type: disk
            SCSI ID: lun32
            SCSI SN: (stdin)=
            Size: 21475 MB, Block size: 512
            Online: Yes
            Removable media: No
            Prevent removal: No
            Readonly: No
            Backing store type: rdwr
            Backing store path: /dev/stor01/vm-asl
            Backing store flags: 
        LUN: 33
            Type: disk
            SCSI ID: lun33
            SCSI SN: (stdin)=
            Size: 10737 MB, Block size: 512
            Online: Yes
            Removable media: No
            Prevent removal: No
            Readonly: No
            Backing store type: rdwr
            Backing store path: /dev/stor01/vm-daseupload
            Backing store flags: 
        LUN: 34
            Type: disk
            SCSI ID: lun34
            SCSI SN: (stdin)=
            Size: 21475 MB, Block size: 512
            Online: Yes
            Removable media: No
            Prevent removal: No
            Readonly: No
            Backing store type: rdwr
            Backing store path: /dev/stor01/vm-arcsde
            Backing store flags: 
        LUN: 35
            Type: disk
            SCSI ID: lun35
            SCSI SN: (stdin)=
            Size: 21475 MB, Block size: 512
            Online: Yes
            Removable media: No
            Prevent removal: No
            Readonly: No
            Backing store type: rdwr
            Backing store path: /dev/stor01/vm-switchwitch
            Backing store flags: 
        LUN: 36
            Type: disk
            SCSI ID: lun36
            SCSI SN: (stdin)=
            Size: 10737 MB, Block size: 512
            Online: Yes
            Removable media: No
            Prevent removal: No
            Readonly: No
            Backing store type: rdwr
            Backing store path: /dev/stor01/vm-matlab
            Backing store flags: 
        LUN: 37
            Type: disk
            SCSI ID: lun37
            SCSI SN: (stdin)=
            Size: 21475 MB, Block size: 512
            Online: Yes
            Removable media: No
            Prevent removal: No
            Readonly: No
            Backing store type: rdwr
            Backing store path: /dev/stor01/vm-spintx
            Backing store flags: 
        LUN: 38
            Type: disk
            SCSI ID: lun38
            SCSI SN: (stdin)=
            Size: 21475 MB, Block size: 512
            Online: Yes
            Removable media: No
            Prevent removal: No
            Readonly: No
            Backing store type: rdwr
            Backing store path: /dev/stor01/vm-atlassian
            Backing store flags: 
        LUN: 39
            Type: disk
            SCSI ID: lun39
            SCSI SN: (stdin)=
            Size: 10737 MB, Block size: 512
            Online: Yes
            Removable media: No
            Prevent removal: No
            Readonly: No
            Backing store type: rdwr
            Backing store path: /dev/stor01/vm-test3
            Backing store flags: 
        LUN: 40
            Type: disk
            SCSI ID: lun40
            SCSI SN: (stdin)=
            Size: 10737 MB, Block size: 512
            Online: Yes
            Removable media: No
            Prevent removal: No
            Readonly: No
            Backing store type: rdwr
            Backing store path: /dev/stor01/vm-nfs
            Backing store flags: 
        LUN: 41
            Type: disk
            SCSI ID: lun41
            SCSI SN: (stdin)=
            Size: 21475 MB, Block size: 512
            Online: Yes
            Removable media: No
            Prevent removal: No
            Readonly: No
            Backing store type: rdwr
            Backing store path: /dev/stor01/vm-test4
            Backing store flags: 
        LUN: 42
            Type: disk
            SCSI ID: lun42
            SCSI SN: (stdin)=
            Size: 21475 MB, Block size: 512
            Online: Yes
            Removable media: No
            Prevent removal: No
            Readonly: No
            Backing store type: rdwr
            Backing store path: /dev/stor01/vm-bamboo
            Backing store flags: 
        LUN: 43
            Type: disk
            SCSI ID: lun43
            SCSI SN: (stdin)=
            Size: 21475 MB, Block size: 512
            Online: Yes
            Removable media: No
            Prevent removal: No
            Readonly: No
            Backing store type: rdwr
            Backing store path: /dev/stor01/vm-wowza-test
            Backing store flags: 
        LUN: 44
            Type: disk
            SCSI ID: lun44
            SCSI SN: (stdin)=
            Size: 53687 MB, Block size: 512
            Online: Yes
            Removable media: No
            Prevent removal: No
            Readonly: No
            Backing store type: rdwr
            Backing store path: /dev/stor01/vm-abman-dev
            Backing store flags: 
        LUN: 45
            Type: disk
            SCSI ID: lun45
            SCSI SN: (stdin)=
            Size: 10737 MB, Block size: 512
            Online: Yes
            Removable media: No
            Prevent removal: No
            Readonly: No
            Backing store type: rdwr
            Backing store path: /dev/stor01/vm-workflow
            Backing store flags: 
        LUN: 46
            Type: disk
            SCSI ID: lun46
            SCSI SN: (stdin)=
            Size: 10737 MB, Block size: 512
            Online: Yes
            Removable media: No
            Prevent removal: No
            Readonly: No
            Backing store type: rdwr
            Backing store path: /dev/stor01/vm-psyimage
            Backing store flags: 
        LUN: 47
            Type: disk
            SCSI ID: lun47
            SCSI SN: (stdin)=
            Size: 21475 MB, Block size: 512
            Online: Yes
            Removable media: No
            Prevent
---

        So it clearly got incomplete output from tgtadm the first
time and successfully retrieved all the information the second
time before it returned a return code of 7.  I found where tgtd
would crash with more than 40 LUN's being discussed back in 2008:
---
http://lists.wpkg.org/pipermail/stgt/2008-December/002528.html

But I couldn't find anything else related to this problem
specifically.

        Has anyone else seen weirdness like this from tgtd?  I
assume the "easy" answer is switch to a newer distribution with
LIO.  Or just keep the multiple checks in place to workaround the
problem.

-- 
Mark Nipper
[email protected] (XMPP)
+1 979 575 3193
-
In theory there is no difference between theory and practice. In
practice there is.
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to