Alex, Thanks for the info.
Would you by chance mind posting your mon script? -Josh On Wed, 2009-12-09 at 21:22 -0500, Alex Dean wrote: > On Dec 9, 2009, at 5:34 PM, Mullis, Josh (CCI - Atlanta) wrote: > > > Shouldn't node1 release the resource if the ping node (1.1.1.1) is > > down? > > That's not how ipfail works. > > ipfail presumes that the two nodes are always in contact. Based on > their ability to ping 1.1.1.1, they will decide which one should hold > your resources. If the two nodes lose contact with each other, you > have a split-brain and all bets are off. > > "Note that ipfail needs redundant communications media to work > correctly - because it won't cause a failover on its own unless it can > contact the other cluster member. In other words, if you're pinging on > the same media as the only heartbeat channel configured, you're > destined to be disappointed in ipfail." > http://linux-ha.org/ipfail > > If your ethernet connection is your only medium your cluster nodes can > use to communicate, ipfail really isn't much use. You could try > adding something like mon. I've written a mon alert which causes > heartbeat to go standby if it can't ping it's gateway IP, and this has > worked pretty well. Mon's really quite easy to learn, and I think it > only took an afternoon of tinkering to get a 'go standby' action I was > happy with. > > http://linux-ha.org/mon > http://mon.wiki.kernel.org/index.php/Main_Page > > You could also switch to a v2 heartbeat+pacemaker configuration, which > will get you resource-level monitoring. In this case, the ability to > ping 1.1.1.1 is your 'resource'. I believe you'd then use pingd > rather than ipfail. I haven't done this personally, but I'm sure many/ > most on this list have. > > alex _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
