Hi,
(12.04.07 06:19), David Vossel wrote:
----- Original Message -----
From: "Kazunori INOUE"<inouek...@intellilink.co.jp>
To: "pacemaker@oss"<pacemaker@oss.clusterlabs.org>
Cc: koi...@intellilink.co.jp
Sent: Thursday, April 5, 2012 10:08:44 PM
Subject: [Pacemaker] on-fail is not effective
Hi,
I am using Pacemaker-1.1 (devel:
7172b7323bb72c51999ce11c6fa5d3ff0a0a4b4f).
The setting of "on-fail" does not become effective.
For example, it becomes default action("restart") even if it
specifies "stop".
The resource is stopping, but if there is nothing to prevent the resource from
starting again it will start after the stop action has completed. This is
probably why 'restart' and 'stop' appear to have the same behavior.
-- Vossel
Is it specifications?
I tested it using the same configuration in Pacemaker-1.0.
As expected, the behavior of the resource differed in Pacemaker-1.1.
At the time of monitor(on-fail="stop") failure,
- Pacemaker-1.0:
the resource stopped and did not start elsewhere.
- Pacemaker-1.1:
the resource stopped and started again on a different node.
---- ----
Configuration:
property no-quorum-policy="ignore" \
stonith-enabled="false" \
startup-fencing="false"
rsc_defaults resource-stickiness="INFINITY" \
migration-threshold="1"
primitive prmDummy1 ocf:pacemaker:Dummy \
op start timeout="90s" on-fail="restart" \
op monitor interval="10s" timeout="60s" on-fail="stop" \
op stop timeout="100s" on-fail="block"
---- ----
State of Pacemaker-1.0:
# crm_mon -rf1
============
Last updated: Mon Apr 9 11:35:02 2012
Stack: Heartbeat
Current DC: vm2 (f370d087-433e-462e-8b83-d4a6c13219fa) - partition with quorum
Version: 1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87
2 Nodes configured, unknown expected votes
1 Resources configured.
============
Online: [ vm1 vm2 ]
Full list of resources:
prmDummy1 (ocf::pacemaker:Dummy): Started vm1
Migration summary:
* Node vm1:
* Node vm2:
I let 'monitor' fail.
# /bin/rm -f /var/run/Dummy-prmDummy1.state
# crm_mon -rf1
============
(snip)
Full list of resources:
prmDummy1 (ocf::pacemaker:Dummy): Stopped
Migration summary:
* Node vm1:
prmDummy1: migration-threshold=1 fail-count=1
* Node vm2:
Failed actions:
prmDummy1_monitor_10000 (node=vm1, call=4, rc=7, status=complete): not
running
#
---- ----
State of Pacemaker-1.1:
# crm_mon -rf1
============
Last updated: Mon Apr 9 13:03:34 2012
Last change: Mon Apr 9 13:03:13 2012 via cibadmin on vm1
Stack: Heartbeat
Current DC: vm2 (f370d087-433e-462e-8b83-d4a6c13219fa) - partition with quorum
Version: 1.1.8-1.el6-0cff1b528574f280a28c030034acabee56004f0f
2 Nodes configured, unknown expected votes
1 Resources configured.
============
Online: [ vm2 vm1 ]
Full list of resources:
prmDummy1 (ocf::pacemaker:Dummy): Started vm1
Migration summary:
* Node vm2:
* Node vm1:
# /bin/rm -f /var/run/Dummy-prmDummy1.state
# crm_mon -rf1
============
(snip)
Online: [ vm2 vm1 ]
Full list of resources:
prmDummy1 (ocf::pacemaker:Dummy): Started vm2
Migration summary:
* Node vm2:
* Node vm1:
prmDummy1: migration-threshold=1 fail-count=1
Failed actions:
prmDummy1_monitor_10000 (node=vm1, call=4, rc=7, status=complete): not
running
#
Best Regards,
Kazunori INOUE
[root@vm1 ~]# crm configure show | grep -A3 "primitive prmDummy1"
primitive prmDummy1 ocf:pacemaker:Dummy \
op start interval="0" timeout="60s" on-fail="restart" \
op monitor interval="10s" timeout="60s" on-fail="stop" \
op stop interval="0" timeout="60s" on-fail="block"
[root@vm1 ~]#
[root@vm1 ~]# crm_mon -f1
============
Last updated: Fri Apr 6 10:13:14 2012
Last change: Fri Apr 6 10:12:42 2012 via cibadmin on vm1
Stack: Heartbeat
Current DC: vm1 (87e0eef1-0d86-4e8a-adfe-51f444a4054f) - partition
with quorum
Version: 1.1.7-7172b73
2 Nodes configured, unknown expected votes
1 Resources configured.
============
Online: [ vm1 vm2 ]
prmDummy1 (ocf::pacemaker:Dummy): Started vm1
Migration summary:
* Node vm1:
* Node vm2:
[root@vm1 ~]#
[root@vm1 ~]# rm -f /var/run/Dummy-prmDummy1.state
[root@vm1 ~]# crm_mon -f1
============
Last updated: Fri Apr 6 10:13:33 2012
Last change: Fri Apr 6 10:12:42 2012 via cibadmin on vm1
Stack: Heartbeat
Current DC: vm1 (87e0eef1-0d86-4e8a-adfe-51f444a4054f) - partition
with quorum
Version: 1.1.7-7172b73
2 Nodes configured, unknown expected votes
1 Resources configured.
============
Online: [ vm1 vm2 ]
prmDummy1 (ocf::pacemaker:Dummy): Started vm2
Migration summary:
* Node vm1:
prmDummy1: migration-threshold=1 fail-count=1
* Node vm2:
Failed actions:
prmDummy1_monitor_10000 (node=vm1, call=4, rc=7,
status=complete): not running
[root@vm1 ~]#
Attached gdb_pengine.log is a log of gdb at the time of monitor
failure.
Is it because the 2nd argument (variable 'key') of the
find_rsc_op_entry()
function is "prmDummy1_last_failure_0"?
Thereby, it seems that "on-fail" cannot be identified. (L117~L205)
Best Regards,
Kazunori INOUE
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org