Re: [Pacemaker] on-fail is not effective

Kazunori INOUE Mon, 09 Apr 2012 03:00:23 -0700

Hi,

(12.04.07 06:19), David Vossel wrote:

----- Original Message -----

From: "Kazunori INOUE"<inouek...@intellilink.co.jp>
To: "pacemaker@oss"<pacemaker@oss.clusterlabs.org>
Cc: koi...@intellilink.co.jp
Sent: Thursday, April 5, 2012 10:08:44 PM
Subject: [Pacemaker]  on-fail is not effective


Hi,

I am using Pacemaker-1.1 (devel:
7172b7323bb72c51999ce11c6fa5d3ff0a0a4b4f).
The setting of "on-fail" does not become effective.
For example, it becomes default action("restart") even if it
specifies "stop".


The resource is stopping, but if there is nothing to prevent the resource from 
starting again it will start after the stop action has completed. This is 
probably why 'restart' and 'stop' appear to have the same behavior.

-- Vossel

Is it specifications?

I tested it using the same configuration in Pacemaker-1.0.
As expected, the behavior of the resource differed in Pacemaker-1.1.

At the time of monitor(on-fail="stop") failure,
- Pacemaker-1.0:
  the resource stopped and did not start elsewhere.
- Pacemaker-1.1:
  the resource stopped and started again on a different node.

---- ----
Configuration:
property no-quorum-policy="ignore" \
        stonith-enabled="false" \
        startup-fencing="false"
rsc_defaults resource-stickiness="INFINITY" \
        migration-threshold="1"
primitive prmDummy1 ocf:pacemaker:Dummy \
        op start timeout="90s" on-fail="restart" \
        op monitor interval="10s" timeout="60s" on-fail="stop" \
        op stop timeout="100s" on-fail="block"

---- ----
State of Pacemaker-1.0:

# crm_mon -rf1
============
Last updated: Mon Apr  9 11:35:02 2012
Stack: Heartbeat
Current DC: vm2 (f370d087-433e-462e-8b83-d4a6c13219fa) - partition with quorum
Version: 1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87
2 Nodes configured, unknown expected votes
1 Resources configured.
============

Online: [ vm1 vm2 ]

Full list of resources:

 prmDummy1      (ocf::pacemaker:Dummy): Started vm1

Migration summary:
* Node vm1:
* Node vm2:

I let 'monitor' fail.
# /bin/rm -f /var/run/Dummy-prmDummy1.state

# crm_mon -rf1
============
(snip)
Full list of resources:

 prmDummy1      (ocf::pacemaker:Dummy): Stopped

Migration summary:
* Node vm1:
   prmDummy1: migration-threshold=1 fail-count=1
* Node vm2:

Failed actions:
    prmDummy1_monitor_10000 (node=vm1, call=4, rc=7, status=complete): not 
running
#

---- ----
State of Pacemaker-1.1:

# crm_mon -rf1
============
Last updated: Mon Apr  9 13:03:34 2012
Last change: Mon Apr  9 13:03:13 2012 via cibadmin on vm1
Stack: Heartbeat
Current DC: vm2 (f370d087-433e-462e-8b83-d4a6c13219fa) - partition with quorum
Version: 1.1.8-1.el6-0cff1b528574f280a28c030034acabee56004f0f
2 Nodes configured, unknown expected votes
1 Resources configured.
============

Online: [ vm2 vm1 ]

Full list of resources:

 prmDummy1      (ocf::pacemaker:Dummy): Started vm1

Migration summary:
* Node vm2:
* Node vm1:

# /bin/rm -f /var/run/Dummy-prmDummy1.state
# crm_mon -rf1
============
(snip)
Online: [ vm2 vm1 ]

Full list of resources:

 prmDummy1      (ocf::pacemaker:Dummy): Started vm2

Migration summary:
* Node vm2:
* Node vm1:
   prmDummy1: migration-threshold=1 fail-count=1

Failed actions:
    prmDummy1_monitor_10000 (node=vm1, call=4, rc=7, status=complete): not 
running
#

Best Regards,
Kazunori INOUE

[root@vm1 ~]# crm configure show | grep -A3 "primitive prmDummy1"
primitive prmDummy1 ocf:pacemaker:Dummy \
         op start interval="0" timeout="60s" on-fail="restart" \
         op monitor interval="10s" timeout="60s" on-fail="stop" \
         op stop interval="0" timeout="60s" on-fail="block"
[root@vm1 ~]#
[root@vm1 ~]# crm_mon -f1
============
Last updated: Fri Apr  6 10:13:14 2012
Last change: Fri Apr  6 10:12:42 2012 via cibadmin on vm1
Stack: Heartbeat
Current DC: vm1 (87e0eef1-0d86-4e8a-adfe-51f444a4054f) - partition
with quorum
Version: 1.1.7-7172b73
2 Nodes configured, unknown expected votes
1 Resources configured.
============

Online: [ vm1 vm2 ]

  prmDummy1      (ocf::pacemaker:Dummy): Started vm1

Migration summary:
* Node vm1:
* Node vm2:
[root@vm1 ~]#
[root@vm1 ~]# rm -f /var/run/Dummy-prmDummy1.state
[root@vm1 ~]# crm_mon -f1
============
Last updated: Fri Apr  6 10:13:33 2012
Last change: Fri Apr  6 10:12:42 2012 via cibadmin on vm1
Stack: Heartbeat
Current DC: vm1 (87e0eef1-0d86-4e8a-adfe-51f444a4054f) - partition
with quorum
Version: 1.1.7-7172b73
2 Nodes configured, unknown expected votes
1 Resources configured.
============

Online: [ vm1 vm2 ]

  prmDummy1      (ocf::pacemaker:Dummy): Started vm2

Migration summary:
* Node vm1:
    prmDummy1: migration-threshold=1 fail-count=1
* Node vm2:

Failed actions:
     prmDummy1_monitor_10000 (node=vm1, call=4, rc=7,
     status=complete): not running
[root@vm1 ~]#

Attached gdb_pengine.log is a log of gdb at the time of monitor
failure.
Is it because the 2nd argument (variable 'key') of the
find_rsc_op_entry()
function is "prmDummy1_last_failure_0"?
Thereby, it seems that "on-fail" cannot be identified. (L117~L205)

Best Regards,
Kazunori INOUE

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] on-fail is not effective

Reply via email to