Luis,

thank you for keeping writing me about this issue.


Luis Freitas wrote:
  There are two heartbeats, the network heartbeat and the disk heartbeat.

If I am not mistaken the disk heartbeat is done on the block device that is mounted as a OCFS2 filesystem. So you can decide which node will fence by cutting its access to the disk device.

I didn't know that, and you were dramatically right.
In fact if I cut access to OCFS2 disk device from the node I want to reboot, this prevent the other one
to self-fence.

When using a SAN this is kind of simple, since there is a external disk device and one node eventually locks the device and forces the other node to be evicted.

Since you are using DRDB, you need to make sure that the node that your cluster manager evicts cannot access the DRDB device any longer. As there are two paths to the DRDB device on each node (one local device and one remote device), I am not exactly sure how you will acomplish this or if DRDB already has this kind of control to prevent a split brain, but what you need to do is to block access to the shared disk device on the evicted node before the OCFS2 timeout.

The trick is there : "what you need to do is to block access to the shared disk device on the evicted node before the OCFS2 timeout"

I was focused on trying to cancel the timeout triggered on node1 if I worked out it is node0 which failed. In fact if I cut disk access of node0 _before_ the 7 seconds (by default) timeout, processing of self-fencing
on the upper numbered node is not initiated and all is fine.

So what I do for making my 2-node cluster working is to use only OCFS2 "filesystem", and not use its inner process of self-fencing, replacing it with mine. This is the only solution I found for achieving a real HA 2-node cluster, still providing services and data access with no interruption, whichever node fails, even if it's the
lower numbered one.

I think I'm able to make my cluster manager working out whether it has to fence a node before OCFS2 timeout,
or at worst I'll increase this 7 seconds default timeout.

About the way I ensure my cluster manager prevent a node from accessing to DRBD device, even if there's
two pathes (local and remote), simple and radical : I fence it...
You are right, I cannot take the time to shutdown everything properly, as to shutdown access to drbd I would have to make it secondary, in order to do that I'd have to unmount OCFS2 partition, and so to stop all services, etc.
Way too long. Instead I use a ssh sending of remote commands :
"echo o > /proc/sysrq-trigger  2>&1; halt -f 2>&1"
and it's immediately powered off just as if I unplugged the power cord...

When I power it back on, it simply has to do a resync, but it's only a differential (and so fast) one, and even if it has to be a complete one, no matter, as the other node is still providing services. There is no split-brain issue, I know my uptodate node is the other one and the new powered one has to
be syncTarget.

Once more I'd like to thank you very much, and Sunil too, for your time, help and hints for this issue.

See you,

Laurent.

begin:vcard
fn:Laurent Neiger
n:Neiger;Laurent
org;quoted-printable:CNRS Grenoble;Centre R=C3=A9seau & Informatique Commun
adr:B.P. 166;;25, avenue des Martyrs;Grenoble;;38042;France
email;internet:[EMAIL PROTECTED]
title;quoted-printable:Administrateur Syst=C3=A8mes & R=C3=A9seaux
tel;work:(0033) (0)4 76 88 79 91
tel;fax:(0033) (0)4 76 88 12 95
note:Certificats : http://igc.services.cnrs.fr/Doc/General/trust.html
x-mozilla-html:TRUE
url:http://cric.grenoble.cnrs.fr
version:2.1
end:vcard

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
Ocfs2-users mailing list
[email protected]
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Reply via email to