Re: [Linux-HA] Looking for a suitable Stonith Solution

Andrew Beekhof Mon, 28 Feb 2011 00:01:52 -0800

On Fri, Feb 25, 2011 at 12:51 PM, Stallmann, Andreas
<[email protected]> wrote:
> Hi!
>
> I conentrate both your answers into one mail, I hope that's allright for you.
>
>> >For now, I need an interim solution, which is, as of now, stonith via 
>> >suicide.
>> Doesn't work as suicide is not considered reliable - by definition the 
>> remaining nodes have no way to verify that the fencing operation was 
>> successful.
>> Suspect it will still fail though, suicide isnt a supported fencing option - 
>> since obviously the other nodes can't confirm it happened.
>
> Ok then, I know I'm a little bit provocative right now:
>
> If "suicide" is no supported fencing option, why is it still included with 
> stonith?


Left over from heartbeat v1 days I guess.
Could also be a testing-only device like ssh.

> It's badly documented, and I didn't find a single (official) document on 
> howto implement a (stable!) suicide-stonith,

Because you can't.  Suicide is not, will not, can not be reliable.
The whole point of stonith is to create a known node state (off) in
situations where you cannot be sure if your peer is alive, dead or
some state in-between.

Suicide does not achieve this in any way, shape or form.
It requires a "sick" node to suddenly start functioning correctly - so
attempting to self-terminate makes some sense, relying on it to
succeed does not seem prudent.

> but it's there, and thus it should be usable. If it isn't, the maintainer 
> should please (please!) remove it or supply something that's working. I do 
> know, that's quite demanding, because the maintainer will probably do the 
> development in his (or her) free time. Still...
>
> I do as well agree, that "suicide" is a very special way of keeping a cluster 
> consistent, very different from the other stonith methods. I wouldn't expect 
> it under stonith, I'd rather think...
>
>> Yes no-quorum-policy=suicide means that all nodes in the partition will end 
>> up being shot, but you still require a real stonith device so that 
>> _someone_else_ can perform it.
> ...that if you set "no-quorum-policy=suicide", the suicide script is executed 
> by the node itself. It should be an *extra* feature *besides* stonith. The 
> procedure should be something like:
>
> 1) node1: Allright, I have no quorum anymore. Let's wait for a while...
> 2)... a while passes
> 3) node1: OK, I'm still without quorum, no contact to my peers, whatsoever. 
> I'd rather shut myself down, before I cause a mess.
>
> If, during (2), the other nodes find a way to shut down the node externaly 
> (if through ssh, a power switch, a virtualisation host...), that's even 
> better, because then the cluster "knows", that it's still consistent. I'm 
> with you, here.
>
> If a split brain happens in a split site scenario, a "suicide" might be the 
> only way to keep up consistency,  because no one will be able to reach any 
> device on the other site... Please correct me if I'm wrong. What do you do in 
> such a case? What's your exemplary implementation of Linux-HA then?
>
> On the other hand, it doen't make any other sense to name a 
> "no-quorum-policy" "suicide", if it's anything, but a suicide (if, at all, 
> one could name it "assisted suicide").
>
> Please correct me: Do I have a utterly wrong understanding of the whole 
> process (that could be very well the case), is the implementation not 
> entirely thought through, or is the naming of certain components not as good 
> as it could be?
>
> I might point you to 
> http://osdir.com/ml/linux.highavailability.devel/2007-11/msg00026.html, 
> because the same thing has been discussed then, and I very much do think, 
> that Lars was right with what he wrote. Has anything changed in the concept 
> of suicide/quorum-loss/stonith since then? That's not a provocative question, 
> well, maybe it is, but it's not meant to be.
>
> In addition: Something that's missing from the manuals is a "case study" (or 
> something the like) on how to implement a split side scenario. How should the 
> cluster be build then? If you have to sides? If you have one? How should the 
> storage-replication be set up? Is synchronous replication like in drbd really 
> a good idea then, performance wise? I think I'll finally have to buy a book. 
> :-) Any recommendations (either english or german prefered).
>
> Well, thank's a lot again, my brain didn't explode (that's something good, I 
> feel), but I'm not entirely happy, though.
>
> Cheers and have a nice weekend,
>
> Andreas
>
>
> ------------------------
> CONET Solutions GmbH, Theodor-Heuss-Allee 19, 53773 Hennef.
> Registergericht/Registration Court: Amtsgericht Siegburg (HRB Nr. 9136)
> Geschäftsführer/Managing Directors: Jürgen Zender (Sprecher/Chairman), Anke 
> Höfer
> Vorsitzender des Aufsichtsrates/Chairman of the Supervisory Board: Hans 
> Jürgen Niemeier
>
> CONET Technologies AG, Theodor-Heuss-Allee 19, 53773 Hennef.
> Registergericht/Registration Court: Amtsgericht Siegburg (HRB Nr. 10328 )
> Vorstand/Member of the Managementboard: Rüdiger Zeyen (Sprecher/Chairman), 
> Wilfried Pütz
> Vorsitzender des Aufsichtsrates/Chairman of the Supervisory Board: Dr. Gerd 
> Jakob
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Looking for a suitable Stonith Solution

Reply via email to