On Dec 9, 2009, at 5:34 PM, Mullis, Josh (CCI - Atlanta) wrote:

Shouldn't node1 release the resource if the ping node (1.1.1.1) is down?

That's not how ipfail works.

ipfail presumes that the two nodes are always in contact. Based on their ability to ping 1.1.1.1, they will decide which one should hold your resources. If the two nodes lose contact with each other, you have a split-brain and all bets are off.

"Note that ipfail needs redundant communications media to work correctly - because it won't cause a failover on its own unless it can contact the other cluster member. In other words, if you're pinging on the same media as the only heartbeat channel configured, you're destined to be disappointed in ipfail."
http://linux-ha.org/ipfail

If your ethernet connection is your only medium your cluster nodes can use to communicate, ipfail really isn't much use. You could try adding something like mon. I've written a mon alert which causes heartbeat to go standby if it can't ping it's gateway IP, and this has worked pretty well. Mon's really quite easy to learn, and I think it only took an afternoon of tinkering to get a 'go standby' action I was happy with.

http://linux-ha.org/mon
http://mon.wiki.kernel.org/index.php/Main_Page

You could also switch to a v2 heartbeat+pacemaker configuration, which will get you resource-level monitoring. In this case, the ability to ping 1.1.1.1 is your 'resource'. I believe you'd then use pingd rather than ipfail. I haven't done this personally, but I'm sure many/ most on this list have.

alex

Attachment: PGP.sig
Description: This is a digitally signed message part

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to