I just want to confirm for the benefit of the list archives that downgrading the SUSE kernel to 2.6.16.21-0.25-smp did solve the fencing problem.
Thank you. John On Thu, 2007-01-18 at 16:57 -0500, Charlie Sharkey wrote: > > It may be a problem with SLES10. It looks like the latest > sles10 kernel patch (2.6.16.27-0.6) has this problem. > > here is the problem as reported by someone earlier: > http://oss.oracle.com/pipermail/ocfs2-users/2007-January/001181.html > http://oss.oracle.com/pipermail/ocfs2-users/2007-January/001182.html > > here is a bugzilla entry > http://oss.oracle.com/bugzilla/show_bug.cgi?id=835 > > > > > -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of John Lange > Sent: Thursday, January 18, 2007 4:03 PM > To: ocfs2-users > Subject: [Ocfs2-users] ocfs2 keeps fencing all my nodes > > I have a 4 node SLES 10 cluster with all nodes attached to a SAN via > fiber. > > The SAN has a EVMS volume formatted with ocfs2. Below is my ocfs2.conf. > > I can mount the volume on any single node but as soon as I mount it on > the second node, it fences one of the nodes. There is never more than > one node active at a time. > > When I check the status of the nodes (quickly before they get fenced) > the satus shows they are heartbeating. > > # /etc/init.d/o2cb status > Module "configfs": Loaded > Filesystem "configfs": Mounted > Module "ocfs2_nodemanager": Loaded > Module "ocfs2_dlm": Loaded > Module "ocfs2_dlmfs": Loaded > Filesystem "ocfs2_dlmfs": Mounted > Checking O2CB cluster ocfs2: Online > Checking O2CB heartbeat: Active > > ======== > > Here are the logs from 2 machines (NOTE that this is the logs from 2 > machines at the same time as they were captured via remote syslog on a > 3rd machine machine) of what happens when the node vs2 is already > running, and node vs3 joins the cluster (mounts the ocfs2 file system). > In this instance vs3 gets fenced. > > Jan 18 14:52:41 vs2 kernel: o2net: accepted connection from node vs3 > (num 2) at 10.1.1.13:7777 Jan 18 14:52:41 vs3 kernel: o2net: connected > to node vs2 (num 1) at 10.1.1.12:7777 Jan 18 14:52:45 vs3 kernel: OCFS2 > 1.2.3-SLES Thu Aug 17 11:38:33 PDT 2006 (build sles) Jan 18 14:52:45 vs2 > kernel: ocfs2_dlm: Node 2 joins domain 89FC5CB6C98B43B998AB8492874EA6CA > Jan 18 14:52:45 vs2 kernel: ocfs2_dlm: Nodes in domain > ("89FC5CB6C98B43B998AB8492874EA6CA"): 1 2 Jan 18 14:52:45 vs3 kernel: > ocfs2_dlm: Nodes in domain ("89FC5CB6C98B43B998AB8492874EA6CA"): 1 2 Jan > 18 14:52:45 vs3 kernel: kjournald starting. Commit interval 5 seconds > Jan 18 14:52:45 vs3 kernel: ocfs2: Mounting device (253,13) on (node 2, > slot 0) Jan 18 14:52:45 vs3 udevd-event[5542]: run_program: ressize 256 > too short Jan 18 14:52:51 vs2 kernel: o2net: connection to node vs3 (num > 2) at 10.1.1.13:7777 has been idle for 10 seconds, shutting it down. > Jan 18 14:52:51 vs2 kernel: (0,0):o2net_idle_timer:1314 here are some > times that might help debug the situation: (tmr 1169153561.99906 now > 1169153571.93951 dr 1169153566.98 030 adv > 1169153566.98039:1169153566.98040 func (09ab0f3c:504) > 1169153565.211482:1169153565.211485) > Jan 18 14:52:51 vs3 kernel: o2net: no longer connected to node vs2 (num > 1) at 10.1.1.12:7777 Jan 18 14:52:51 vs2 kernel: o2net: no longer > connected to node vs3 (num 2) at 10.1.1.13:7777 > > ========== > > I previously had configured ocfs2 for userspace heartbeating but > couldn't get that running so I reconfigured for disk based. Could that > now be the cause of this problem? > > Where do the nodes write the heartbeats? I see nothing on the ocfs2 > system. > > Also, I have no /config directory that is mentioned in the docs. Is that > normal? > > Here is /etc/ocfs2/cluster.conf > > node: > ip_port = 7777 > ip_address = 10.1.1.11 > number = 0 > name = vs1 > cluster = ocfs2 > > node: > ip_port = 7777 > ip_address = 10.1.1.12 > number = 1 > name = vs2 > cluster = ocfs2 > > node: > ip_port = 7777 > ip_address = 10.1.1.13 > number = 2 > name = vs3 > cluster = ocfs2 > > node: > ip_port = 7777 > ip_address = 10.1.1.14 > number = 3 > name = vs4 > cluster = ocfs2 > > cluster: > node_count = 4 > name = ocfs2 > > > Regards, > > Any tips on how I can go about diagnosing this problem? > > Thanks, > John Lange > > > > _______________________________________________ > Ocfs2-users mailing list > Ocfs2-users@oss.oracle.com > http://oss.oracle.com/mailman/listinfo/ocfs2-users _______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users