On 03/09/2013, at 4:32 PM, Ulrich Windl <[email protected]> wrote:
> Hi! > > I don't have a real answer for this, but I can report other bad experience > with 2-node cluster like yours: > > If the DC is fenced, the other node tries to become DC, but if the other node > (who still thinks he's DC) reboots just before the other node has completed > his "ego trip", both nodes cannot agree on who's becoming DC. I'll have to > reboot (or shut down OpenAIS) on one of these nodes. Seen in SLES11 SP2 > (latest updates). I hope you've reported that to suse. Definitely should not be happening. > > An idea for you problem: If the cluster would count "reboots within a > timeframe" (e.g. as node attribute), the fencong operation could change from > reboot to poweroff. I don't know how to do it, though. > > Regards, > Ulrich > >>>> Alex Sudakar <[email protected]> schrieb am 03.09.2013 um 05:23 in > Nachricht > <calq2s-hxkq5ghv9bs1snnojk4gtnl1su-nzujpdxwosv2ap...@mail.gmail.com>: >> I've got a very simple question which I suspect betrays my lack of >> understanding of something basic. Could someone help me understand? >> >> If I have a two-node Pacemaker cluster - say, a really simple cluster >> of two nodes, A & B, with a solitary network connection between them - >> then I have to set no-quorum-policy to 'ignore'. If the network >> connection is broken then both A & B will attempt to STONITH each >> other. >> >> Is there anything that would stop an endless cycle of each killing the >> other if the actions of the STONITH agents are set to reboot? >> >> I.e.: >> >> - A & B race to STONITH each other >> - A kills B >> - A assumes resources >> >> - B reboots >> - B can't see A >> - B kills A >> - B assumes resources >> >> - A reboots >> - A can't see B >> - A kills B >> - A assumes resources >> >> ... etc. >> >> It's to stop this sort of cycle that I've set my STONITH actions to >> 'off' rather than 'reboot'. >> >> But I was reading the 'Fencing topology' document that Digimer >> referenced and I was reminded in my perusal that many people/clusters >> use a 'reboot' action. >> >> For a simple quorum-less cluster of two nodes how do those clusters >> avoid a never-ending cycle of each node killing the other, if neither >> node can 'see' the other via corosync? >> >> It's a very basic question; I think I'm forgetting something obvious. >> Thanks for any help! >> _______________________________________________ >> Linux-HA mailing list >> [email protected] >> http://lists.linux-ha.org/mailman/listinfo/linux-ha >> See also: http://linux-ha.org/ReportingProblems > > > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
