19.07.2013 14:38, Howley, Tom wrote:
Hi,
I have been doing some testing of a fairly standard pacemaker/corosync setup
with DRBD (with resource-level fencing) and have noticed the following in
relation to testing network failures:
- Handling of all ports being blocked is OK, based on hundreds of tests.
- Handling of cable-pulls seems OK, based on only 10 tests.
- ifdown ethX leads to split-brain roughly 50% of the time due to two
underlying issues:
1. corosync (possibly by design) handles loss of network interface differently to other
network failures. I can only see this from the point of view of the logs: "[TOTEM
] The network interface is down.", which is different from cable-pull log, where I
don't see that message. I'm guessing this as I don't know the code.
2. corosync allows a non-quorate partition, in my case a single node, to update
the CIB. This behaviour has been previously confirmed in reply to previous
mails on this list and it has been mentioned that there may be improvements in
this area in the future. This on its own seems like a bug to me.
My question is: is it possible for me to configure corosync/drbd to handle the ifdown
scenario or do I simply have to tell people "do not test with ifdown", as I
have seen mentioned in a few places on the web? If I do have to leave out ifdown testing,
how can I be sure that I haven't missed out testing some real network failure scenario.
When you shut down an interface, IP is removed. As a result, DRBD can
not bind to IP.
In real life, it's not going to happen. So just tell people "do not test
with ifdown".
--
WBR,
Viacheslav Dubrovskyi
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org