Re: [Pacemaker] split brain - after network recovery - resources can still be migrated

Digimer Sat, 25 Oct 2014 14:34:41 -0700

On 25/10/14 05:09 PM, Vladimir wrote:

Hi,


currently I'm testing a 2 node setup using ubuntu trusty.

# The scenario:

All communication links betwenn the 2 nodes are cut off. This results
in a split brain situation and both nodes take their resources online.

When the communication links get back, I see following behaviour:

On drbd level the split brain is detected and the device is
disconnected on both nodes because of "after-sb-2pri disconnect" and
then it goes to StandAlone ConnectionState.

I'm wondering why pacemaker does not let the resources fail.
It is still possible to migrate resources between the nodes although
they're in StandAlone ConnectionState. After a split brain that's not
what I want.

Is this the expected behaviour? Is it possible to let the resources
fail after the network recovery to avoid fürther data corruption.

(At the moment I can't use resource or node level fencing in my setup.)

Here the main part of my config:

#> dpkg -l | awk '$2 ~ /^(pacem|coro|drbd|libqb)/{print $2,$3}'
corosync 2.3.3-1ubuntu1
drbd8-utils 2:8.4.4-1ubuntu1
libqb-dev 0.16.0.real-1ubuntu3
libqb0 0.16.0.real-1ubuntu3
pacemaker 1.1.10+git20130802-1ubuntu2.1
pacemaker-cli-utils 1.1.10+git20130802-1ubuntu2.1

# pacemaker
primitive drbd-mysql ocf:linbit:drbd \
params drbd_resource="mysql" \
op monitor interval="29s" role="Master" \
op monitor interval="30s" role="Slave"

ms ms-drbd-mysql drbd-mysql \
meta master-max="1" master-node-max="1" clone-max="2"
clone-node-max="1" notify="true"

Split-brains are prevented by using reliable fencing (aka stonith). Youconfigure stonith in pacemaker (using IPMI/iRMC/iLO/etc, switched PDUs,etc). Then you configure DRBD to use the crm-fence-peer.sh fence-handlerand you set the fencing policy to 'resource-and-stonith;'.

This way, if all links fail, both nodes block and call a fence. Thefaster one fences (powers off) the slower, and then it begins recovery,assured that the peer is not doing the same.

Without stonith/fencing, then there is no defined behaviour. You willget split-brains and that is that. Consider; Both nodes lose contactwith it's peer. Without fencing, both must assume the peer is dead andthus take over resources.

This is why stonith is required in clusters. Even with quorum, you can'tassume anything about the state of the peer until it is fenced, so itwould only give you a false sense of security.


--
Digimer
Papers and Projects: https://alteeve.ca/w/

What if the cure for cancer is trapped in the mind of a person withoutaccess to education?


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] split brain - after network recovery - resources can still be migrated

Reply via email to