[Linux-HA] HowTo correctly set up stonith:suicide (was AW: Looking for a suitable Stonith Solution)

Stallmann, Andreas Thu, 24 Feb 2011 08:41:57 -0800

Hi again!

I tried to think my setup trough again, but I'm still not coming to any 
sensible  conclusion.


The stonith:suicide ressource was set up as a clone ressource, because that's 
how it's done in all the examples I found. Well - I didn't find a single 
example on "suicide", but that's at least how it's done for the other suicide 
agents.

Could that be my error? Shouldn't the suicide ressource beeing stopped on all 
nodes *with* quorum and beeing started only on the nodes, which have *no* 
quorum? If I'm right, how is that accomplished?

Strangely, according to the error messages in my logs (/var/log/messages), my 
disconnected system (mgmt3) is trying to stonith one (yes, only one, it always 
tries mgmt01, not mgmt02) other systems!

Feb 24 17:28:43 mgmt03 stonith-ng: [5906]: ERROR: remote_op_query_timeout: 
Query f7cbd271-ffa2-4015-a132-0107517d2ea1 for mgmt01 timed out
Feb 24 17:28:43 mgmt03 stonith-ng: [5906]: ERROR: remote_op_timeout: Action 
poweroff (f7cbd271-ffa2-4015-a132-0107517d2ea1) for mgmt01 timed out
Feb 24 17:28:43 mgmt03 crmd: [5911]: ERROR: tengine_stonith_callback: Stonith 
of mgmt01 failed (-7)... aborting transition.

Looking at the "warn" messages, one can see, that stonith somehow likes to kill 
*all* nodes:

Feb 24 17:29:01 mgmt03 pengine: [5910]: WARN: stage6: Scheduling Node mgmt01 
for STONITH
Feb 24 17:29:01 mgmt03 pengine: [5910]: WARN: stage6: Scheduling Node mgmt02 
for STONITH
Feb 24 17:29:01 mgmt03 pengine: [5910]: WARN: stage6: Scheduling Node mgmt03 
for STONITH

And "info" reveals, that stonith indeed tries to kill mgmt01:

Feb 24 17:29:01 mgmt03 stonith-ng: [5906]: info: log_data_element: 
stonith_query: Query <stonith_command t="stonith-ng" 
st_async_id="872fdb20-c172-417e-9a21-1233abc5a91a" st_op="st_query" 
st_callid="0" st_callopt="0" st_remote_op="87     
2fdb20-c172-417e-9a21-1233abc5a91a" st_target="mgmt01" 
st_device_action="poweroff" st_clientid="940dcf86-d33a-4cb     
d-a9ea-1054af0b5e33" src="mgmt03" seq="1467" />
Feb 24 17:29:01 mgmt03 stonith-ng: [5906]: info: can_fence_host_with_device: 
suicide_res:2 can not fence  mgmt01: dynamic-list

Something is obviously going entirely wrong here...

If any one of you has a functioning suicide-stonith solution running, please 
let me know how you do it.

See below for my configuration (again).

Thanks in advance,

Andreas

~~~~~~Output from crm configure show~~~~~~~~~~
primitive suicide_res stonith:suicide ...
clone fenc_clon suicide_res
...
property $id="cib-bootstrap-options" \
        dc-version="1.1.2-8b9ec9ccc5060457ac761dce1de719af86895b10" \
        cluster-infrastructure="openais" \
        expected-quorum-votes="3" \
        stonith-enabled="true" \
        no-quorum-policy="suicide" \
        stonith-action="poweroff"
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

------------------------
CONET Solutions GmbH, Theodor-Heuss-Allee 19, 53773 Hennef.
Registergericht/Registration Court: Amtsgericht Siegburg (HRB Nr. 9136)
Geschäftsführer/Managing Directors: Jürgen Zender (Sprecher/Chairman), Anke 
Höfer
Vorsitzender des Aufsichtsrates/Chairman of the Supervisory Board: Hans Jürgen 
Niemeier

CONET Technologies AG, Theodor-Heuss-Allee 19, 53773 Hennef.
Registergericht/Registration Court: Amtsgericht Siegburg (HRB Nr. 10328 )
Vorstand/Member of the Managementboard: Rüdiger Zeyen (Sprecher/Chairman), 
Wilfried Pütz
Vorsitzender des Aufsichtsrates/Chairman of the Supervisory Board: Dr. Gerd 
Jakob
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] HowTo correctly set up stonith:suicide (was AW: Looking for a suitable Stonith Solution)

Reply via email to