On 19/07/2013, at 9:38 PM, "Howley, Tom" <tom.how...@hp.com> wrote:
> Hi, > > I have been doing some testing of a fairly standard pacemaker/corosync setup > with DRBD (with resource-level fencing) and have noticed the following in > relation to testing network failures: > > - Handling of all ports being blocked is OK, based on hundreds of tests. > - Handling of cable-pulls seems OK, based on only 10 tests. > - ifdown ethX leads to split-brain roughly 50% of the time due to two > underlying issues: > > 1. corosync (possibly by design) handles loss of network interface > differently to other network failures. I can only see this from the point of > view of the logs: "[TOTEM ] The network interface is down.", which is > different from cable-pull log, where I don't see that message. I'm guessing > this as I don't know the code. > 2. corosync allows a non-quorate partition, in my case a single node, to > update the CIB. This behaviour has been previously confirmed in reply to > previous mails on this list and it has been mentioned that there may be > improvements in this area in the future. This on its own seems like a bug to > me. > > My question is: is it possible for me to configure corosync/drbd to handle > the ifdown scenario or do I simply have to tell people "do not test with > ifdown", as I have seen mentioned in a few places on the web? If I do have to > leave out ifdown testing, how can I be sure that I haven't missed out testing > some real network failure scenario. > > I don't have the time to do hundreds of cable-pulls, which is what I'm trying > to simulate. I will look into introducing failures via the switch, but > ideally I'd like to be able to handle ifdown properly or have a clear answer > to my problem. IIRC ifdown removes the device, the cable pull doesn't. blocking all ports is a pretty good approximation of a cable pull (ip address remains intact, but no sending or receiving). > > I would really appreciate advice on this as it's a serious issue for me. > > Thanks, > Tom > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org