Hi,
This problem has not been fixed yet. (2012 Jul 29, 33119da31c)
When stonithd was terminated abnormally, doesn't crmd have to reboot
like time when lrmd was terminated?
The following patch will reboot crmd, if connection with stonithd
breaks. I checked this problem was fixed, however cannot grasp the
extent of the impact...
[root@dev1 pacemaker]# git diff
diff --git a/crmd/te_utils.c b/crmd/te_utils.c
index f6a7550..deb4513 100644
--- a/crmd/te_utils.c
+++ b/crmd/te_utils.c
@@ -83,6 +83,7 @@ tengine_stonith_connection_destroy(stonith_t * st,
stonith_event_t *e)
{
if (is_set(fsa_input_register, R_ST_REQUIRED)) {
crm_crit("Fencing daemon connection failed");
+ register_fsa_input(C_FSA_INTERNAL, I_ERROR, NULL);
mainloop_set_trigger(stonith_reconnect);
} else {
[root@dev1 pacemaker]#
Best regards,
Kazunori INOUE
(12.05.09 16:11), Andrew Beekhof wrote:
On Mon, May 7, 2012 at 7:52 PM, Kazunori INOUE
<inouek...@intellilink.co.jp> wrote:
Hi,
On the Pacemkaer-1.1 + Corosync stack, although stonithd reboots
after an abnormal end, STONITH is not performed after that.
I am using the newest devel.
- pacemaker : db5e16736cc2682fbf37f81cd47be7d17d5a2364
- corosync : 88dd3e1eeacd64701d665f10acbc40f3795dd32f
- glue : 2686:66d5f0c135c9
* 0. cluster's state.
[root@vm1 ~]# crm_mon -r1
============
Last updated: Wed May 2 16:07:29 2012
Last change: Wed May 2 16:06:35 2012 via cibadmin on vm1
Stack: corosync
Current DC: vm1 (1) - partition WITHOUT quorum
Version: 1.1.7-db5e167
2 Nodes configured, unknown expected votes
3 Resources configured.
============
Online: [ vm1 vm2 ]
Full list of resources:
prmDummy (ocf::pacemaker:Dummy): Started vm2
prmStonith1 (stonith:external/libvirt): Started vm2
prmStonith2 (stonith:external/libvirt): Started vm1
[root@vm1 ~]# crm configure show
node $id="1" vm1
node $id="2" vm2
primitive prmDummy ocf:pacemaker:Dummy \
op start interval="0s" timeout="60s" on-fail="restart" \
op monitor interval="10s" timeout="60s" on-fail="fence" \
op stop interval="0s" timeout="60s" on-fail="stop"
primitive prmStonith1 stonith:external/libvirt \
params hostlist="vm1" hypervisor_uri="qemu+ssh://f/system" \
op start interval="0s" timeout="60s" \
op monitor interval="3600s" timeout="60s" \
op stop interval="0s" timeout="60s"
primitive prmStonith2 stonith:external/libvirt \
params hostlist="vm2" hypervisor_uri="qemu+ssh://g/system" \
op start interval="0s" timeout="60s" \
op monitor interval="3600s" timeout="60s" \
op stop interval="0s" timeout="60s"
location rsc_location-prmDummy prmDummy \
rule $id="rsc_location-prmDummy-rule" 200: #uname eq vm2
location rsc_location-prmStonith1 prmStonith1 \
rule $id="rsc_location-prmStonith1-rule" 200: #uname eq vm2 \
rule $id="rsc_location-prmStonith1-rule-0" -inf: #uname eq vm1
location rsc_location-prmStonith2 prmStonith2 \
rule $id="rsc_location-prmStonith2-rule" 200: #uname eq vm1 \
rule $id="rsc_location-prmStonith2-rule-0" -inf: #uname eq vm2
property $id="cib-bootstrap-options" \
dc-version="1.1.7-db5e167" \
cluster-infrastructure="corosync" \
no-quorum-policy="ignore" \
stonith-enabled="true" \
startup-fencing="false" \
stonith-timeout="120s"
rsc_defaults $id="rsc-options" \
resource-stickiness="INFINITY" \
migration-threshold="1"
* 1. terminate stonithd forcibly.
[root@vm1 ~]# pkill -9 stonithd
* 2. I cause STONITH, but stonithd says that a device is not found and
does not STONITH.
[root@vm1 ~]# ssh vm2 'rm /var/run/Dummy-prmDummy.state'
[root@vm1 ~]# grep Found /var/log/ha-debug
May 2 16:13:07 vm1 stonith-ng[15115]: debug: stonith_query: Found 0
matching devices for 'vm2'
May 2 16:13:19 vm1 stonith-ng[15115]: debug: stonith_query: Found 0
matching devices for 'vm2'
May 2 16:13:31 vm1 stonith-ng[15115]: debug: stonith_query: Found 0
matching devices for 'vm2'
May 2 16:13:43 vm1 stonith-ng[15115]: debug: stonith_query: Found 0
matching devices for 'vm2'
(snip)
[root@vm1 ~]#
After stonithd reboots, it seems that STONITH-resource or lrmd needs
to be rebooted.. is this the designed behavior?
No, that sounds like a bug.
# crm resource restart <STONITH resource (prmStonith2)>
or
# /usr/lib64/heartbeat/lrmd -r (on the node which stonithd rebooted)
----
Best regards,
Kazunori INOUE
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org