19.07.2013 14:38, Howley, Tom wrote:
Hi,

I have been doing some testing of a fairly standard pacemaker/corosync setup 
with DRBD (with resource-level fencing) and have noticed the following in 
relation to testing network failures:

- Handling of all ports being blocked is OK, based on hundreds of tests.
- Handling of cable-pulls seems OK, based on only 10 tests.
- ifdown ethX leads to split-brain roughly 50% of the time due to two 
underlying issues:

1. corosync (possibly by design) handles loss of network interface differently to other 
network failures. I can only see this from the point of view of the   logs: "[TOTEM 
] The network interface is down.", which is different from cable-pull log, where I 
don't see that message. I'm guessing this as I don't know the code.
2. corosync allows a non-quorate partition, in my case a single node, to update 
the CIB. This behaviour has been previously confirmed in reply to previous 
mails on this list and it has been mentioned that there may be improvements in 
this area in the future. This on its own seems like a bug to me.

My question is: is it possible for me to configure corosync/drbd to handle the ifdown 
scenario or do I simply have to tell people "do not test with ifdown", as I 
have seen mentioned in a few places on the web? If I do have to leave out ifdown testing, 
how can I be sure that I haven't missed out testing some real network failure scenario.
When you shut down an interface, IP is removed. As a result, DRBD can not bind to IP. In real life, it's not going to happen. So just tell people "do not test with ifdown".

--
WBR,
Viacheslav Dubrovskyi


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to