Florian, the problem here seems to be with network. The nodes are running into network heartbeat timeout and hence second node is getting fenced. Do you see o2net thread consuming 100% cpu on any node? if not then probably check your network thanks, --Srini
florian.engelm...@bt.com wrote: > Hello, > our Debian etch cluster nodes are panicing because of ocfs2 fencing if > one SAN path fails. > > modinfo ocfs2 > filename: /lib/modules/2.6.18-6-amd64/kernel/fs/ocfs2/ocfs2.ko > author: Oracle > license: GPL > description: OCFS2 1.3.3 > version: 1.3.3 > vermagic: 2.6.18-6-amd64 SMP mod_unload gcc-4.1 > depends: ocfs2_dlm,ocfs2_nodemanager,jbd > srcversion: 0798424846E68F10172C203 > > modinfo ocfs2_dlmfs > filename: > /lib/modules/2.6.18-6-amd64/kernel/fs/ocfs2/dlm/ocfs2_dlmfs.ko > author: Oracle > license: GPL > description: OCFS2 DLMFS 1.3.3 > version: 1.3.3 > vermagic: 2.6.18-6-amd64 SMP mod_unload gcc-4.1 > depends: ocfs2_dlm,ocfs2_nodemanager > srcversion: E3780E12396118282B3C1AD > > defr1elcbtd02:~# modinfo ocfs2_dlm > filename: > /lib/modules/2.6.18-6-amd64/kernel/fs/ocfs2/dlm/ocfs2_dlm.ko > author: Oracle > license: GPL > description: OCFS2 DLM 1.3.3 > version: 1.3.3 > vermagic: 2.6.18-6-amd64 SMP mod_unload gcc-4.1 > depends: ocfs2_nodemanager > srcversion: 7DC395EA08AE4CE826C5B92 > > modinfo ocfs2_nodemanager > filename: > /lib/modules/2.6.18-6-amd64/kernel/fs/ocfs2/cluster/ocfs2_nodemanager.ko > author: Oracle > license: GPL > description: OCFS2 Node Manager 1.3.3 > version: 1.3.3 > vermagic: 2.6.18-6-amd64 SMP mod_unload gcc-4.1 > depends: configfs > srcversion: C4C9871302E1910B78DAE40 > > modinfo qla2xxx > filename: > /lib/modules/2.6.18-6-amd64/kernel/drivers/scsi/qla2xxx/qla2xxx.ko > author: QLogic Corporation > description: QLogic Fibre Channel HBA Driver > license: GPL > version: 8.01.07-k1 > vermagic: 2.6.18-6-amd64 SMP mod_unload gcc-4.1 > depends: scsi_mod,scsi_transport_fc,firmware_class > alias: pci:v00001077d00002100sv*sd*bc*sc*i* > alias: pci:v00001077d00002200sv*sd*bc*sc*i* > alias: pci:v00001077d00002300sv*sd*bc*sc*i* > alias: pci:v00001077d00002312sv*sd*bc*sc*i* > alias: pci:v00001077d00002322sv*sd*bc*sc*i* > alias: pci:v00001077d00006312sv*sd*bc*sc*i* > alias: pci:v00001077d00006322sv*sd*bc*sc*i* > alias: pci:v00001077d00002422sv*sd*bc*sc*i* > alias: pci:v00001077d00002432sv*sd*bc*sc*i* > alias: pci:v00001077d00005422sv*sd*bc*sc*i* > alias: pci:v00001077d00005432sv*sd*bc*sc*i* > srcversion: B8E1608E257391DCAFD9224 > parm: ql2xfdmienable:Enables FDMI registratons Default is 0 - > no FDMI. 1 - perfom FDMI. (int) > parm: extended_error_logging:Option to enable extended error > logging, Default is 0 - no logging. 1 - log errors. (int) > parm: ql2xallocfwdump:Option to enable allocation of memory > for a firmware dump during HBA initialization. Memory allocation > requirements vary by ISP type. Default is 1 - allocate memory. (int) > parm: ql2xloginretrycount:Specify an alternate value for the > NVRAM login retry count. (int) > parm: ql2xplogiabsentdevice:Option to enable PLOGI to devices > that are not present after a Fabric scan. This is needed for several > broken switches. Default is 0 - no PLOGI. 1 - perfom PLOGI. (int) > parm: qlport_down_retry:Maximum number of command retries to a > port that returns a PORT-DOWN status. (int) > parm: ql2xlogintimeout:Login timeout value in seconds. (int) > > modinfo dm_multipath > filename: > /lib/modules/2.6.18-6-amd64/kernel/drivers/md/dm-multipath.ko > description: device-mapper multipath target > author: Sistina Software <dm-de...@redhat.com> > license: GPL > vermagic: 2.6.18-6-amd64 SMP mod_unload gcc-4.1 > depends: dm-mod > > modinfo dm_mod > filename: /lib/modules/2.6.18-6-amd64/kernel/drivers/md/dm-mod.ko > description: device-mapper driver > author: Joe Thornber <dm-de...@redhat.com> > license: GPL > vermagic: 2.6.18-6-amd64 SMP mod_unload gcc-4.1 > depends: > parm: major:The major number of the device mapper (uint) > > modinfo dm_round_robin > filename: > /lib/modules/2.6.18-6-amd64/kernel/drivers/md/dm-round-robin.ko > description: device-mapper round-robin multipath path selector > author: Sistina Software <dm-de...@redhat.com> > license: GPL > vermagic: 2.6.18-6-amd64 SMP mod_unload gcc-4.1 > depends: dm-multipath > > There is no self compiled software just the official repository was > used. > The nodes are connected to our two independent SANs. The storage systems > are EMC Clariion CX3-20f and EMC Clariion CX500. > > multipath.conf: > defaults { > rr_min_io 1000 > polling_interval 2 > no_path_retry 5 > user_friendly_names yes > } > > blacklist { > devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*" > devnode "^hd[a-z][[0-9]*]" > devnode "^cciss!c[0-9]d[0-9]*[p[0-9]*]" > device { > vendor "DGC" > product "LUNZ" # EMC Management LUN > } > device { > vendor "ATA" #We do not need mutlipathing for local > drives > product "*" > } > device { > vendor "AMI" # No multipathing for SUN Virtual devices > product "*" > } > device { > vendor "HITACHI" # No multipathing for local scsi disks > product "H101414SCSUN146G" > } > } > > devices { > ## Device attributes for EMC CLARiiON > device { > vendor "DGC" > product "*" > path_grouping_policy group_by_prio > getuid_callout "/sbin/scsi_id -g -u -s > /block/%n" > prio_callout "/sbin/mpath_prio_emc /dev/%n" > hardware_handler "1 emc" > features "1 queue_if_no_path" > no_path_retry fail > path_checker emc_clariion > path_selector "round-robin 0" > failback immediate > user_friendly_names yes > } > } > > multipaths { > multipath { > wwid > 3600601603ac511001c7c92fec775dd11 > alias stosan01_lun070 > } > } > > multipath -ll: > stosan01_lun070 (3600601603ac511001c7c92fec775dd11) dm-7 DGC,RAID 5 > [size=133G][features=0][hwhandler=1 emc] > \_ round-robin 0 [prio=2][active] > \_ 0:0:1:1 sdd 8:48 [active][ready] > \_ 1:0:1:1 sdh 8:112 [active][ready] > \_ round-robin 0 [prio=0][enabled] > \_ 0:0:0:1 sdb 8:16 [active][ready] > \_ 1:0:0:1 sdf 8:80 [active][ready] > > > As we use lvm2 we added /dev/sd* to the filter: > filter = [ "r|/dev/cdrom|", "r|/dev/sd.*|" ] > > Here is what happened and what we did to reconstruct the situation to > find a solution: > > On 02.06.2009 we did something wrong with the zoning on one of our two > SANs and all servers (about 40) lost one path to the SAN. Only two > servers crashed. Those two are our Debian etch heartbeat cluster > described above. > The console showed a kernel panic because of ocfs2 was fencing both > nodes. > > This was the message: > O2hb_write_timeout: 165 ERROR: Heartbeat write timeout to device dm-7 > after 12000 milliseconds > > So we decided to change the o2cb settings to: > O2CB_HEARTBEAT_THRESHOLD=31 > O2CB_IDLE_TIMEOUT_MS=30000 > O2CB_KEEPALIVE_DELAY_MS=2000 > O2CB_RECONNECT_DELAY_MS=2000 > > We switched all cluster resources to the 1st node to test the new > settings on the second node. We removed the 2nd node from the zoning (we > also tested shutting down the port with the same result) and got a > different error but still ending up with a kernel panic: > > Jun 4 16:41:05 defr1elcbtd02 kernel: o2net: no longer connected to node > defr1elcbtd01 (num 0) at 192.168.0.101:7777 > Jun 4 16:41:27 defr1elcbtd02 kernel: rport-0:0-0: blocked FC remote > port time out: removing target and saving binding > Jun 4 16:41:27 defr1elcbtd02 kernel: rport-0:0-1: blocked FC remote > port time out: removing target and saving binding > Jun 4 16:41:27 defr1elcbtd02 kernel: sd 0:0:1:1: SCSI error: return > code = 0x00010000 > Jun 4 16:41:27 defr1elcbtd02 kernel: end_request: I/O error, dev sdd, > sector 1672 > Jun 4 16:41:27 defr1elcbtd02 kernel: device-mapper: multipath: Failing > path 8:48. > Jun 4 16:41:27 defr1elcbtd02 kernel: device-mapper: multipath: Failing > path 8:16. > Jun 4 16:41:27 defr1elcbtd02 kernel: scsi 0:0:1:1: rejecting I/O to > device being removed > Jun 4 16:41:27 defr1elcbtd02 kernel: device-mapper: multipath emc: long > trespass command will be send > Jun 4 16:41:27 defr1elcbtd02 kernel: device-mapper: multipath emc: > honor reservation bit will not be set (default) > Jun 4 16:41:27 defr1elcbtd02 kernel: device-mapper: table: 253:7: > multipath: error getting device > Jun 4 16:41:27 defr1elcbtd02 kernel: device-mapper: ioctl: error adding > target to table > Jun 4 16:41:27 defr1elcbtd02 kernel: device-mapper: multipath emc: long > trespass command will be send > Jun 4 16:41:27 defr1elcbtd02 kernel: device-mapper: multipath emc: > honor reservation bit will not be set (default) > Jun 4 16:41:29 defr1elcbtd02 kernel: device-mapper: multipath emc: > emc_pg_init: sending switch-over command > Jun 4 16:42:01 defr1elcbtd02 kernel: > (10751,1):dlm_send_remote_convert_request:395 ERROR: status = -107 > Jun 4 16:42:01 defr1elcbtd02 kernel: > (10751,1):dlm_wait_for_node_death:374 5EE89BC01EFC405E9197C198DEEAE678: > waiting 5000ms for notification of death of node 0 > Jun 4 16:42:07 defr1elcbtd02 kernel: > (10751,1):dlm_send_remote_convert_request:395 ERROR: status = -107 > Jun 4 16:42:07 defr1elcbtd02 kernel: > (10751,1):dlm_wait_for_node_death:374 5EE89BC01EFC405E9197C198DEEAE678: > waiting 5000ms for notification of death of node 0 > [...] > After 60 seconds: > > (8,0): o2quo_make_decision:143 ERROR: fending this node because it is > connected to a half-quorum of 1 out of 2 nodes which doesn't include the > lowest active node 0 > > > multipath -ll changed to: > stosan01_lun070 (3600601603ac511001c7c92fec775dd11) dm-7 DGC,RAID 5 > [size=133G][features=0][hwhandler=1 emc] > \_ round-robin 0 [prio=1][active] > \_ 0:0:1:1 sdd 8:48 [active][ready] > \_ round-robin 0 [prio=0][enabled] > \_ 0:0:0:1 sdb 8:16 [active][ready] > > The ocfs2 filesystem is still mounted an writable. Even if I enable the > zoneing (or the FC port) again within the 60 seconds ocfs2 does not > reconnect to node 1 and panics the kernel after 60 seconds while > multipath -ll shows both path again. > > I do not understand at all what the Ethernet heartbeat connection of > ocfs2 has to do with the SAN connection. > > The strangest thing at all is - this does not happen always! After some > reboots the system keeps running stable even if I shutdown a FC port and > enable it again many times. There is no constant behaviour... It happens > most of the times, but at about 10% it does not happen and everything is > working as intended. > > Any explanations or ideas what causes this behaviour? > > I will test this on Debian lenny to see if the Debian version makes a > difference. > > Best regards, > Florian > > _______________________________________________ > Ocfs2-users mailing list > Ocfs2-users@oss.oracle.com > http://oss.oracle.com/mailman/listinfo/ocfs2-users > _______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users