Re: [Pacemaker] ifdown ethX + corosync + DRBD = split-brain?

Andrew Beekhof Wed, 24 Jul 2013 20:32:18 -0700

On 19/07/2013, at 9:38 PM, "Howley, Tom" <tom.how...@hp.com> wrote:


> Hi,
> 
> I have been doing some testing of a fairly standard pacemaker/corosync setup 
> with DRBD (with resource-level fencing) and have noticed the following in 
> relation to testing network failures:
> 
> - Handling of all ports being blocked is OK, based on hundreds of tests.
> - Handling of cable-pulls seems OK, based on only 10 tests.
> - ifdown ethX leads to split-brain roughly 50% of the time due to two 
> underlying issues:
> 
> 1. corosync (possibly by design) handles loss of network interface 
> differently to other network failures. I can only see this from the point of 
> view of the   logs: "[TOTEM ] The network interface is down.", which is 
> different from cable-pull log, where I don't see that message. I'm guessing 
> this as I don't know the code.
> 2. corosync allows a non-quorate partition, in my case a single node, to 
> update the CIB. This behaviour has been previously confirmed in reply to 
> previous mails on this list and it has been mentioned that there may be 
> improvements in this area in the future. This on its own seems like a bug to 
> me.
> 
> My question is: is it possible for me to configure corosync/drbd to handle 
> the ifdown scenario or do I simply have to tell people "do not test with 
> ifdown", as I have seen mentioned in a few places on the web? If I do have to 
> leave out ifdown testing, how can I be sure that I haven't missed out testing 
> some real network failure scenario. 
> 
> I don't have the time to do hundreds of cable-pulls, which is what I'm trying 
> to simulate. I will look into introducing failures via the switch, but 
> ideally I'd like to be able to handle ifdown properly or have a clear answer 
> to my problem.

IIRC ifdown removes the device, the cable pull doesn't.
blocking all ports is a pretty good approximation of a cable pull (ip address 
remains intact, but no sending or receiving).

> 
> I would really appreciate advice on this as it's a serious issue for me.
> 
> Thanks,
> Tom
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] ifdown ethX + corosync + DRBD = split-brain?

Reply via email to