On Mon, May 7, 2012 at 7:52 PM, Kazunori INOUE <inouek...@intellilink.co.jp> wrote: > Hi, > > On the Pacemkaer-1.1 + Corosync stack, although stonithd reboots > after an abnormal end, STONITH is not performed after that. > > I am using the newest devel. > - pacemaker : db5e16736cc2682fbf37f81cd47be7d17d5a2364 > - corosync : 88dd3e1eeacd64701d665f10acbc40f3795dd32f > - glue : 2686:66d5f0c135c9 > > > * 0. cluster's state. > > [root@vm1 ~]# crm_mon -r1 > ============ > Last updated: Wed May 2 16:07:29 2012 > Last change: Wed May 2 16:06:35 2012 via cibadmin on vm1 > Stack: corosync > Current DC: vm1 (1) - partition WITHOUT quorum > Version: 1.1.7-db5e167 > 2 Nodes configured, unknown expected votes > 3 Resources configured. > ============ > > Online: [ vm1 vm2 ] > > Full list of resources: > > prmDummy (ocf::pacemaker:Dummy): Started vm2 > prmStonith1 (stonith:external/libvirt): Started vm2 > prmStonith2 (stonith:external/libvirt): Started vm1 > > [root@vm1 ~]# crm configure show > node $id="1" vm1 > node $id="2" vm2 > primitive prmDummy ocf:pacemaker:Dummy \ > op start interval="0s" timeout="60s" on-fail="restart" \ > op monitor interval="10s" timeout="60s" on-fail="fence" \ > op stop interval="0s" timeout="60s" on-fail="stop" > primitive prmStonith1 stonith:external/libvirt \ > params hostlist="vm1" hypervisor_uri="qemu+ssh://f/system" \ > op start interval="0s" timeout="60s" \ > op monitor interval="3600s" timeout="60s" \ > op stop interval="0s" timeout="60s" > primitive prmStonith2 stonith:external/libvirt \ > params hostlist="vm2" hypervisor_uri="qemu+ssh://g/system" \ > op start interval="0s" timeout="60s" \ > op monitor interval="3600s" timeout="60s" \ > op stop interval="0s" timeout="60s" > location rsc_location-prmDummy prmDummy \ > rule $id="rsc_location-prmDummy-rule" 200: #uname eq vm2 > location rsc_location-prmStonith1 prmStonith1 \ > rule $id="rsc_location-prmStonith1-rule" 200: #uname eq vm2 \ > rule $id="rsc_location-prmStonith1-rule-0" -inf: #uname eq vm1 > location rsc_location-prmStonith2 prmStonith2 \ > rule $id="rsc_location-prmStonith2-rule" 200: #uname eq vm1 \ > rule $id="rsc_location-prmStonith2-rule-0" -inf: #uname eq vm2 > property $id="cib-bootstrap-options" \ > dc-version="1.1.7-db5e167" \ > cluster-infrastructure="corosync" \ > no-quorum-policy="ignore" \ > stonith-enabled="true" \ > startup-fencing="false" \ > stonith-timeout="120s" > rsc_defaults $id="rsc-options" \ > resource-stickiness="INFINITY" \ > migration-threshold="1" > > > * 1. terminate stonithd forcibly. > > [root@vm1 ~]# pkill -9 stonithd > > > * 2. I cause STONITH, but stonithd says that a device is not found and > does not STONITH. > > [root@vm1 ~]# ssh vm2 'rm /var/run/Dummy-prmDummy.state' > [root@vm1 ~]# grep Found /var/log/ha-debug > May 2 16:13:07 vm1 stonith-ng[15115]: debug: stonith_query: Found 0 > matching devices for 'vm2' > May 2 16:13:19 vm1 stonith-ng[15115]: debug: stonith_query: Found 0 > matching devices for 'vm2' > May 2 16:13:31 vm1 stonith-ng[15115]: debug: stonith_query: Found 0 > matching devices for 'vm2' > May 2 16:13:43 vm1 stonith-ng[15115]: debug: stonith_query: Found 0 > matching devices for 'vm2' > (snip) > [root@vm1 ~]# > > > After stonithd reboots, it seems that STONITH-resource or lrmd needs > to be rebooted.. is this the designed behavior?
No, that sounds like a bug. > > # crm resource restart <STONITH resource (prmStonith2)> > or > # /usr/lib64/heartbeat/lrmd -r (on the node which stonithd rebooted) > > ---- > Best regards, > Kazunori INOUE > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org