Hi again!
I tried to think my setup trough again, but I'm still not coming to any
sensible conclusion.
The stonith:suicide ressource was set up as a clone ressource, because that's
how it's done in all the examples I found. Well - I didn't find a single
example on "suicide", but that's at least how it's done for the other suicide
agents.
Could that be my error? Shouldn't the suicide ressource beeing stopped on all
nodes *with* quorum and beeing started only on the nodes, which have *no*
quorum? If I'm right, how is that accomplished?
Strangely, according to the error messages in my logs (/var/log/messages), my
disconnected system (mgmt3) is trying to stonith one (yes, only one, it always
tries mgmt01, not mgmt02) other systems!
Feb 24 17:28:43 mgmt03 stonith-ng: [5906]: ERROR: remote_op_query_timeout:
Query f7cbd271-ffa2-4015-a132-0107517d2ea1 for mgmt01 timed out
Feb 24 17:28:43 mgmt03 stonith-ng: [5906]: ERROR: remote_op_timeout: Action
poweroff (f7cbd271-ffa2-4015-a132-0107517d2ea1) for mgmt01 timed out
Feb 24 17:28:43 mgmt03 crmd: [5911]: ERROR: tengine_stonith_callback: Stonith
of mgmt01 failed (-7)... aborting transition.
Looking at the "warn" messages, one can see, that stonith somehow likes to kill
*all* nodes:
Feb 24 17:29:01 mgmt03 pengine: [5910]: WARN: stage6: Scheduling Node mgmt01
for STONITH
Feb 24 17:29:01 mgmt03 pengine: [5910]: WARN: stage6: Scheduling Node mgmt02
for STONITH
Feb 24 17:29:01 mgmt03 pengine: [5910]: WARN: stage6: Scheduling Node mgmt03
for STONITH
And "info" reveals, that stonith indeed tries to kill mgmt01:
Feb 24 17:29:01 mgmt03 stonith-ng: [5906]: info: log_data_element:
stonith_query: Query <stonith_command t="stonith-ng"
st_async_id="872fdb20-c172-417e-9a21-1233abc5a91a" st_op="st_query"
st_callid="0" st_callopt="0" st_remote_op="87
2fdb20-c172-417e-9a21-1233abc5a91a" st_target="mgmt01"
st_device_action="poweroff" st_clientid="940dcf86-d33a-4cb
d-a9ea-1054af0b5e33" src="mgmt03" seq="1467" />
Feb 24 17:29:01 mgmt03 stonith-ng: [5906]: info: can_fence_host_with_device:
suicide_res:2 can not fence mgmt01: dynamic-list
Something is obviously going entirely wrong here...
If any one of you has a functioning suicide-stonith solution running, please
let me know how you do it.
See below for my configuration (again).
Thanks in advance,
Andreas
~~~~~~Output from crm configure show~~~~~~~~~~
primitive suicide_res stonith:suicide ...
clone fenc_clon suicide_res
...
property $id="cib-bootstrap-options" \
dc-version="1.1.2-8b9ec9ccc5060457ac761dce1de719af86895b10" \
cluster-infrastructure="openais" \
expected-quorum-votes="3" \
stonith-enabled="true" \
no-quorum-policy="suicide" \
stonith-action="poweroff"
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
------------------------
CONET Solutions GmbH, Theodor-Heuss-Allee 19, 53773 Hennef.
Registergericht/Registration Court: Amtsgericht Siegburg (HRB Nr. 9136)
Geschäftsführer/Managing Directors: Jürgen Zender (Sprecher/Chairman), Anke
Höfer
Vorsitzender des Aufsichtsrates/Chairman of the Supervisory Board: Hans Jürgen
Niemeier
CONET Technologies AG, Theodor-Heuss-Allee 19, 53773 Hennef.
Registergericht/Registration Court: Amtsgericht Siegburg (HRB Nr. 10328 )
Vorstand/Member of the Managementboard: Rüdiger Zeyen (Sprecher/Chairman),
Wilfried Pütz
Vorsitzender des Aufsichtsrates/Chairman of the Supervisory Board: Dr. Gerd
Jakob
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems