[Pacemaker] heartbeat:anything resource not stop/monitoring after reboot

David Coulson Thu, 05 Sep 2013 17:03:30 -0700

We patched and rebooted one of our clusters this morning - I verifiedthat pacemaker is the same as previous, plus it matches another similarcluster.


There is a resource in the cluster defined as:


primitive re-named-reload ocf:heartbeat:anything \
        params binfile="/usr/sbin/rndc" cmdline_options="reload"

This is the last resource in a group after the named:lsb and an ipaddrresource, so named binds to the VIP

After the reboot the re-named-reload resource is all screwed up. Thestart seems to work, but the monitor is failing and the stop doesn't work:

Sep 5 11:14:14 dresproddns02 lrmd[82091]: notice: operation_finished:re-named-reload_stop_0:582081 [/usr/lib/ocf/resource.d/heartbeat/anything: line 60: kill: (580334) - Nosuch process ]Sep 5 11:14:14 dresproddns02 crmd[82092]: notice: process_lrm_event:LRM operation re-named-reload_stop_0 (call=33446, rc=0,cib-update=11044, confirmed=true) okSep 5 11:14:15 dresproddns02 crmd[82092]: notice: process_lrm_event:LRM operation re-named-reload_start_0 (call=33450, rc=0,cib-update=11045, confirmed=true) okSep 5 11:14:15 dresproddns02 lrmd[82091]: notice: operation_finished:re-named-reload_monitor_60000:582121 [/usr/lib/ocf/resource.d/heartbeat/anything: line 60: kill: (582109) - Nosuch process ]Sep 5 11:14:15 dresproddns02 crmd[82092]: notice: process_lrm_event:LRM operation re-named-reload_monitor_60000 (call=33453, rc=1,cib-update=11046, confirmed=false) unknown error


The ocf-tester fails on both clusters

ocf-tester -n reload -o binfile="/usr/sbin/rndc" -ocmdline_options="reload" /usr/lib/ocf/resource.d/heartbeat/anything

Beginning tests for /usr/lib/ocf/resource.d/heartbeat/anything...
* rc=1: Monitoring an active resource should return 0
* rc=1: Probing an active resource should return 0
* Your agent does not support the notify action (optional)
* Your agent does not support the demote action (optional)
* Your agent does not support the promote action (optional)
* Your agent does not support master/slave (optional)
* rc=1: Monitoring an active resource should return 0
* rc=1: Monitoring an active resource should return 0
* Your agent does not support the reload action (optional)
Tests failed: /usr/lib/ocf/resource.d/heartbeat/anything failed 4 tests

So, I guess the question is really - Why is it working at all on thecluster it is working on? The rndc process doesn't hang around for morethan a few seconds, so the monitor should never really see it running.

I did copy over the heartbeat/anything script from the workingenvironment to the broken one, and we have the same issue.

Short of writing a resource that does a start and forces a rc=0 forstop/monitor, any ideas why this is behaving the way it is?


David

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[Pacemaker] heartbeat:anything resource not stop/monitoring after reboot

Reply via email to