Re: [Linux-HA] Antw: Quick 'death match cycle' question.

Andrew Beekhof Tue, 03 Sep 2013 21:19:08 -0700

On 03/09/2013, at 4:32 PM, Ulrich Windl <[email protected]> 
wrote:


> Hi!
> 
> I don't have a real answer for this, but I can report other bad experience 
> with 2-node cluster like yours:
> 
> If the DC is fenced, the other node tries to become DC, but if the other node 
> (who still thinks he's DC) reboots just before the other node has completed 
> his "ego trip", both nodes cannot agree on who's becoming DC. I'll have to 
> reboot (or shut down OpenAIS) on one of these nodes. Seen in SLES11 SP2 
> (latest updates).

I hope you've reported that to suse.
Definitely should not be happening.

> 
> An idea for you problem: If the cluster would count "reboots within a 
> timeframe" (e.g. as node attribute), the fencong operation could change from 
> reboot to poweroff. I don't know how to do it, though.
> 
> Regards,
> Ulrich
> 
>>>> Alex Sudakar <[email protected]> schrieb am 03.09.2013 um 05:23 in
> Nachricht
> <calq2s-hxkq5ghv9bs1snnojk4gtnl1su-nzujpdxwosv2ap...@mail.gmail.com>:
>> I've got a very simple question which I suspect betrays my lack of
>> understanding of something basic.  Could someone help me understand?
>> 
>> If I have a two-node Pacemaker cluster - say, a really simple cluster
>> of two nodes, A & B, with a solitary network connection between them -
>> then I have to set no-quorum-policy to 'ignore'.  If the network
>> connection is broken then both A & B will attempt to STONITH each
>> other.
>> 
>> Is there anything that would stop an endless cycle of each killing the
>> other if the actions of the STONITH agents are set to reboot?
>> 
>> I.e.:
>> 
>> -  A & B race to STONITH each other
>> -  A kills B
>> -  A assumes resources
>> 
>> -  B reboots
>> -  B can't see A
>> -  B kills A
>> -  B assumes resources
>> 
>> -  A reboots
>> -  A can't see B
>> -  A kills B
>> -  A assumes resources
>> 
>> ... etc.
>> 
>> It's to stop this sort of cycle that I've set my STONITH actions to
>> 'off' rather than 'reboot'.
>> 
>> But I was reading the 'Fencing topology' document that Digimer
>> referenced and I was reminded in my perusal that many people/clusters
>> use a 'reboot' action.
>> 
>> For a simple quorum-less cluster of two nodes how do those clusters
>> avoid a never-ending cycle of each node killing the other, if neither
>> node can 'see' the other via corosync?
>> 
>> It's a very basic question; I think I'm forgetting something obvious.
>> Thanks for any help!
>> _______________________________________________
>> Linux-HA mailing list
>> [email protected] 
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha 
>> See also: http://linux-ha.org/ReportingProblems 
> 
> 
> 
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems

signature.asc
Description: Message signed with OpenPGP using GPGMail

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Antw: Quick 'death match cycle' question.

Reply via email to