Re: [Ocfs2-users] 2-node configuration ?

Sunil Mushran Thu, 28 Feb 2008 09:47:05 -0800

Laurent Neiger wrote:

Hi all,
We're building a 2-node cluster that we'd like to work inactive/active mode.Since drbd 8.x this feature is possible so we hope to achieve acluster wherethe 2 nodes would be able to share the load, not only one and theother node
in stanby mode, waiting for the primary's failure...
In order to have a drbd active/active mode, we need to have a clusterfilesystem,
handling DLM. Our choice went to OCFS-2 for its numerous features.
But we encountered a problem : when we cut off the network link of,let's saynode2, for simulating a crash, we managed to make node 1 fence node2via drbdin order to avoid a split-brain configuration, but node1 thenself-fence itself
apparently due to ocfs2.

If you cut the comm link between two nodes in a 2 node cluster, thelower nodenumber survives and the higher node number fences. So I am not sure whynode1

fenced. Do you have the netconsole logs? That should tell us something.

After some researches, we understood this seems to be a "normal"feature :
in a 2-node cluster, when communication (and so ocfs2 heartbeat) is lost,
the remaining node has no way to know if it's its peer which is down or
itself which is away from the cluster, so it self-fences.

Our first idea was so to add a third node, so that the 2 remainning nodes
still communicate and no ocfs2 self-fence is launched.
But ocfs2 heartbeat, as explained in the FAQ, writes to the heartbeatsystem file,which has to be shared. If we set up a third node with a little ocfs2partitionand o2cb, it doesn't appear into the cluster, even when declared asthird nodein /etc/ocfs2/cluster.conf (of each node). Because the ocfs2 partitionon the
third node is not shared, so ocfs2 heartbeat is not shared.

If we run ocfs2_hb_ctl -I -d /dev/drbd0 on node0 and node1, we get back
the same reference for heartbeat, but a different one on node2 (thirdnode).
And in /var/log/kern.log on node 0, we have
...
Feb 28 11:46:42 maq1 kernel: ocfs2_dlm: Node 1 joins domainFB305B8298D94DCA9F9BF75D0AA09B8DFeb 28 11:46:42 maq1 kernel: ocfs2_dlm: Nodes in domain("FB305B8298D94DCA9F9BF75D0AA09B8D"): 0 1
But nothing about node2...

And we cannot share a common partition as drbd only works with 2 peers...

Would anyone have any hint about how we could solve this issue ?

Is there a way to make a 2-node ocfs2 cluster work
or must we have at least 3 nodes ?
But if 3 nodes are required, how to make it work with DRBD ?
Or in 2-node config., can we block self-fencing (but is it desirable) ?

No, one does not have to have 3 nodes when one only wants 2 nodes.

If I have to speculate I would think that node1 fenced because it could not

complete the disk hb io within the hbtimeout. If all you severed was thecommlink,then that should not effect the disk hb traffic. That is unless thecommlink is

also being used by drbd (via linux ha)....


_______________________________________________
Ocfs2-users mailing list
[email protected]
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Re: [Ocfs2-users] 2-node configuration ?

Reply via email to