Florian,
the problem here seems to be with network. The nodes are running into 
network heartbeat timeout and hence second node is getting fenced. Do 
you see o2net thread consuming 100% cpu on any node? if not then 
probably check your network
thanks,
--Srini

florian.engelm...@bt.com wrote:
> Hello,
> our Debian etch cluster nodes are panicing because of ocfs2 fencing if
> one SAN path fails.
>
> modinfo ocfs2
> filename:       /lib/modules/2.6.18-6-amd64/kernel/fs/ocfs2/ocfs2.ko
> author:         Oracle
> license:        GPL
> description:    OCFS2 1.3.3
> version:        1.3.3
> vermagic:       2.6.18-6-amd64 SMP mod_unload gcc-4.1
> depends:        ocfs2_dlm,ocfs2_nodemanager,jbd
> srcversion:     0798424846E68F10172C203
>
> modinfo ocfs2_dlmfs
> filename:
> /lib/modules/2.6.18-6-amd64/kernel/fs/ocfs2/dlm/ocfs2_dlmfs.ko
> author:         Oracle
> license:        GPL
> description:    OCFS2 DLMFS 1.3.3
> version:        1.3.3
> vermagic:       2.6.18-6-amd64 SMP mod_unload gcc-4.1
> depends:        ocfs2_dlm,ocfs2_nodemanager
> srcversion:     E3780E12396118282B3C1AD
>
> defr1elcbtd02:~# modinfo ocfs2_dlm
> filename:
> /lib/modules/2.6.18-6-amd64/kernel/fs/ocfs2/dlm/ocfs2_dlm.ko
> author:         Oracle
> license:        GPL
> description:    OCFS2 DLM 1.3.3
> version:        1.3.3
> vermagic:       2.6.18-6-amd64 SMP mod_unload gcc-4.1
> depends:        ocfs2_nodemanager
> srcversion:     7DC395EA08AE4CE826C5B92
>
> modinfo ocfs2_nodemanager
> filename:
> /lib/modules/2.6.18-6-amd64/kernel/fs/ocfs2/cluster/ocfs2_nodemanager.ko
> author:         Oracle
> license:        GPL
> description:    OCFS2 Node Manager 1.3.3
> version:        1.3.3
> vermagic:       2.6.18-6-amd64 SMP mod_unload gcc-4.1
> depends:        configfs
> srcversion:     C4C9871302E1910B78DAE40
>
> modinfo qla2xxx
> filename:
> /lib/modules/2.6.18-6-amd64/kernel/drivers/scsi/qla2xxx/qla2xxx.ko
> author:         QLogic Corporation
> description:    QLogic Fibre Channel HBA Driver
> license:        GPL
> version:        8.01.07-k1
> vermagic:       2.6.18-6-amd64 SMP mod_unload gcc-4.1
> depends:        scsi_mod,scsi_transport_fc,firmware_class
> alias:          pci:v00001077d00002100sv*sd*bc*sc*i*
> alias:          pci:v00001077d00002200sv*sd*bc*sc*i*
> alias:          pci:v00001077d00002300sv*sd*bc*sc*i*
> alias:          pci:v00001077d00002312sv*sd*bc*sc*i*
> alias:          pci:v00001077d00002322sv*sd*bc*sc*i*
> alias:          pci:v00001077d00006312sv*sd*bc*sc*i*
> alias:          pci:v00001077d00006322sv*sd*bc*sc*i*
> alias:          pci:v00001077d00002422sv*sd*bc*sc*i*
> alias:          pci:v00001077d00002432sv*sd*bc*sc*i*
> alias:          pci:v00001077d00005422sv*sd*bc*sc*i*
> alias:          pci:v00001077d00005432sv*sd*bc*sc*i*
> srcversion:     B8E1608E257391DCAFD9224
> parm:           ql2xfdmienable:Enables FDMI registratons Default is 0 -
> no FDMI. 1 - perfom FDMI. (int)
> parm:           extended_error_logging:Option to enable extended error
> logging, Default is 0 - no logging. 1 - log errors. (int)
> parm:           ql2xallocfwdump:Option to enable allocation of memory
> for a firmware dump during HBA initialization.  Memory allocation
> requirements vary by ISP type.  Default is 1 - allocate memory. (int)
> parm:           ql2xloginretrycount:Specify an alternate value for the
> NVRAM login retry count. (int)
> parm:           ql2xplogiabsentdevice:Option to enable PLOGI to devices
> that are not present after a Fabric scan.  This is needed for several
> broken switches. Default is 0 - no PLOGI. 1 - perfom PLOGI. (int)
> parm:           qlport_down_retry:Maximum number of command retries to a
> port that returns a PORT-DOWN status. (int)
> parm:           ql2xlogintimeout:Login timeout value in seconds. (int)
>
> modinfo dm_multipath
> filename:
> /lib/modules/2.6.18-6-amd64/kernel/drivers/md/dm-multipath.ko
> description:    device-mapper multipath target
> author:         Sistina Software <dm-de...@redhat.com>
> license:        GPL
> vermagic:       2.6.18-6-amd64 SMP mod_unload gcc-4.1
> depends:        dm-mod
>
> modinfo dm_mod
> filename:       /lib/modules/2.6.18-6-amd64/kernel/drivers/md/dm-mod.ko
> description:    device-mapper driver
> author:         Joe Thornber <dm-de...@redhat.com>
> license:        GPL
> vermagic:       2.6.18-6-amd64 SMP mod_unload gcc-4.1
> depends:
> parm:           major:The major number of the device mapper (uint)
>
> modinfo dm_round_robin
> filename:
> /lib/modules/2.6.18-6-amd64/kernel/drivers/md/dm-round-robin.ko
> description:    device-mapper round-robin multipath path selector
> author:         Sistina Software <dm-de...@redhat.com>
> license:        GPL
> vermagic:       2.6.18-6-amd64 SMP mod_unload gcc-4.1
> depends:        dm-multipath
>
> There is no self compiled software just the official repository was
> used.
> The nodes are connected to our two independent SANs. The storage systems
> are EMC Clariion CX3-20f and EMC Clariion CX500.
>
> multipath.conf:
> defaults {
>         rr_min_io                       1000
>         polling_interval                2
>         no_path_retry                   5
>         user_friendly_names             yes
> }
>
> blacklist {
>         devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
>         devnode "^hd[a-z][[0-9]*]"
>         devnode "^cciss!c[0-9]d[0-9]*[p[0-9]*]"
>         device {
>                 vendor "DGC"
>                 product "LUNZ" # EMC Management LUN
>         }
>         device {
>                 vendor "ATA"  #We do not need mutlipathing for local
> drives
>                 product "*"
>         }
>         device {
>                 vendor "AMI" # No multipathing for SUN Virtual devices
>                 product "*"
>         }
>         device {
>                 vendor "HITACHI" # No multipathing for local scsi disks
>                 product "H101414SCSUN146G"
>         }
> }
>
> devices {
>         ## Device attributes for EMC CLARiiON
>         device {
>                 vendor                  "DGC"
>                 product                 "*"
>                 path_grouping_policy    group_by_prio
>                 getuid_callout          "/sbin/scsi_id -g -u -s
> /block/%n"
>                 prio_callout            "/sbin/mpath_prio_emc /dev/%n"
>                 hardware_handler        "1 emc"
>                 features                "1 queue_if_no_path"
>                 no_path_retry           fail
>                 path_checker            emc_clariion
>                 path_selector           "round-robin 0"
>                 failback                immediate
>                 user_friendly_names     yes
>         }
> }
>
> multipaths {
>         multipath {
>                 wwid
> 3600601603ac511001c7c92fec775dd11
>                 alias                   stosan01_lun070
>         }
> }
>
> multipath -ll:
> stosan01_lun070 (3600601603ac511001c7c92fec775dd11) dm-7 DGC,RAID 5
> [size=133G][features=0][hwhandler=1 emc]
> \_ round-robin 0 [prio=2][active]
>  \_ 0:0:1:1 sdd 8:48  [active][ready]
>  \_ 1:0:1:1 sdh 8:112 [active][ready]
> \_ round-robin 0 [prio=0][enabled]
>  \_ 0:0:0:1 sdb 8:16  [active][ready]
>  \_ 1:0:0:1 sdf 8:80  [active][ready]
>
>
> As we use lvm2 we added /dev/sd* to the filter:
> filter = [ "r|/dev/cdrom|", "r|/dev/sd.*|" ]
>
> Here is what happened and what we did to reconstruct the situation to
> find a solution:
>
> On 02.06.2009 we did something wrong with the zoning on one of our two
> SANs and all servers (about 40) lost one path to the SAN. Only two
> servers crashed. Those two are our Debian etch heartbeat cluster
> described above.
> The console showed a kernel panic because of ocfs2 was fencing both
> nodes.
>
> This was the message:
> O2hb_write_timeout: 165 ERROR: Heartbeat write timeout to device dm-7
> after 12000 milliseconds
>
> So we decided to change the o2cb settings to:
> O2CB_HEARTBEAT_THRESHOLD=31
> O2CB_IDLE_TIMEOUT_MS=30000
> O2CB_KEEPALIVE_DELAY_MS=2000
> O2CB_RECONNECT_DELAY_MS=2000
>
> We switched all cluster resources to the 1st node to test the new
> settings on the second node. We removed the 2nd node from the zoning (we
> also tested shutting down the port with the same result) and got a
> different error but still ending up with a kernel panic:
>
> Jun  4 16:41:05 defr1elcbtd02 kernel: o2net: no longer connected to node
> defr1elcbtd01 (num 0) at 192.168.0.101:7777
> Jun  4 16:41:27 defr1elcbtd02 kernel:  rport-0:0-0: blocked FC remote
> port time out: removing target and saving binding
> Jun  4 16:41:27 defr1elcbtd02 kernel:  rport-0:0-1: blocked FC remote
> port time out: removing target and saving binding
> Jun  4 16:41:27 defr1elcbtd02 kernel: sd 0:0:1:1: SCSI error: return
> code = 0x00010000
> Jun  4 16:41:27 defr1elcbtd02 kernel: end_request: I/O error, dev sdd,
> sector 1672
> Jun  4 16:41:27 defr1elcbtd02 kernel: device-mapper: multipath: Failing
> path 8:48.
> Jun  4 16:41:27 defr1elcbtd02 kernel: device-mapper: multipath: Failing
> path 8:16.
> Jun  4 16:41:27 defr1elcbtd02 kernel: scsi 0:0:1:1: rejecting I/O to
> device being removed
> Jun  4 16:41:27 defr1elcbtd02 kernel: device-mapper: multipath emc: long
> trespass command will be send
> Jun  4 16:41:27 defr1elcbtd02 kernel: device-mapper: multipath emc:
> honor reservation bit will not be set (default)
> Jun  4 16:41:27 defr1elcbtd02 kernel: device-mapper: table: 253:7:
> multipath: error getting device
> Jun  4 16:41:27 defr1elcbtd02 kernel: device-mapper: ioctl: error adding
> target to table
> Jun  4 16:41:27 defr1elcbtd02 kernel: device-mapper: multipath emc: long
> trespass command will be send
> Jun  4 16:41:27 defr1elcbtd02 kernel: device-mapper: multipath emc:
> honor reservation bit will not be set (default)
> Jun  4 16:41:29 defr1elcbtd02 kernel: device-mapper: multipath emc:
> emc_pg_init: sending switch-over command
> Jun  4 16:42:01 defr1elcbtd02 kernel:
> (10751,1):dlm_send_remote_convert_request:395 ERROR: status = -107
> Jun  4 16:42:01 defr1elcbtd02 kernel:
> (10751,1):dlm_wait_for_node_death:374 5EE89BC01EFC405E9197C198DEEAE678:
> waiting 5000ms for notification of death of node 0
> Jun  4 16:42:07 defr1elcbtd02 kernel:
> (10751,1):dlm_send_remote_convert_request:395 ERROR: status = -107
> Jun  4 16:42:07 defr1elcbtd02 kernel:
> (10751,1):dlm_wait_for_node_death:374 5EE89BC01EFC405E9197C198DEEAE678:
> waiting 5000ms for notification of death of node 0
> [...]
> After 60 seconds:
>
> (8,0): o2quo_make_decision:143 ERROR: fending this node because it is
> connected to a half-quorum of 1 out of 2 nodes which doesn't include the
> lowest active node 0
>
>
> multipath -ll changed to:
> stosan01_lun070 (3600601603ac511001c7c92fec775dd11) dm-7 DGC,RAID 5
> [size=133G][features=0][hwhandler=1 emc]
> \_ round-robin 0 [prio=1][active]
>  \_ 0:0:1:1 sdd 8:48  [active][ready]
> \_ round-robin 0 [prio=0][enabled]
>  \_ 0:0:0:1 sdb 8:16  [active][ready]
>
> The ocfs2 filesystem is still mounted an writable. Even if I enable the
> zoneing (or the FC port) again within the 60 seconds ocfs2 does not
> reconnect to node 1 and panics the kernel after 60 seconds while
> multipath -ll shows both path again.
>
> I do not understand at all what the Ethernet heartbeat connection of
> ocfs2 has to do with the SAN connection.
>
> The strangest thing at all is - this does not happen always! After some
> reboots the system keeps running stable even if I shutdown a FC port and
> enable it again many times. There is no constant behaviour... It happens
> most of the times, but at about 10% it does not happen and everything is
> working as intended.
>
> Any explanations or ideas what causes this behaviour?
>
> I will test this on Debian lenny to see if the Debian version makes a
> difference.
>
> Best regards,
> Florian
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users@oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>   

_______________________________________________
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Reply via email to