[Pacemaker] Possible error in RA invocation

Santiago Pérez Thu, 30 Jan 2014 11:55:18 -0800

Hi everyone,

I am running a two-node cluster which hosts two Xen VMs. We're usingDRBD, but it's managed directly from Xen.


The configuration of one of this resources is as follows:

primitive xen-vm1 ocf:heartbeat:Xen
        params xmfile="/etc/xen/vm1.cfg"
        op monitor interval="30s"
        op start interval="0" timeout="60s"
        op stop interval="0" timeout="300s"
        op migrate_from interval="0" timeout="240" ingerval="0"
        op migrate_to interval="0" timeout="240"
        meta allow-migrate="true" target-role="Started"
        meta target-role="Started"

I have a problem with the monitor operation. It seems to be workingfine... until it doesn't. The cluster can be running for weeks withoutany failure, but sometimes the monitor operation fails with a reallystrange error from the resource agent. This is an excerpt of one of thefailures:

Jan 28 14:40:20 xenhost1 lrmd: [3822]: info: rsc:xen-vm1 monitor[71](pid 11756)Jan 28 14:40:20 xenhost1 lrmd: [3822]: info: operation monitor[71] onxen-vm1 for client 3825: pid 11756 exited with return code 0Jan 28 15:40:26 xenhost1 lrmd: [3822]: info: rsc:xen-vm1 monitor[71](pid 18065)Jan 28 15:40:27 xenhost1 lrmd: [3822]: info: operation monitor[71] onxen-vm1 for client 3825: pid 18065 exited with return code 0Jan 28 16:40:32 xenhost1 lrmd: [3822]: info: rsc:xen-vm1 monitor[71](pid 24373)Jan 28 16:40:32 xenhost1 lrmd: [3822]: info: operation monitor[71] onxen-vm1 for client 3825: pid 24373 exited with return code 0Jan 28 17:40:38 xenhost1 lrmd: [3822]: info: rsc:xen-vm1 monitor[71](pid 30686)Jan 28 17:40:38 xenhost1 lrmd: [3822]: info: operation monitor[71] onxen-vm1 for client 3825: pid 30686 exited with return code 0Jan 28 18:40:44 xenhost1 lrmd: [3822]: info: rsc:xen-vm1 monitor[71](pid 4593)Jan 28 18:40:44 xenhost1 lrmd: [3822]: info: operation monitor[71] onxen-vm1 for client 3825: pid 4593 exited with return code 0Jan 28 18:55:23 xenhost1 lrmd: [3822]: info: RA output:(xen-vm1:monitor:stderr) /usr/lib/ocf/resource.d//heartbeat/Xen: 71: local:Jan 28 18:55:23 xenhost1 lrmd: [3822]: info: RA output:(xen-vm1:monitor:stderr) en-list: bad variable nameJan 28 18:55:23 xenhost1 lrmd: [3822]: info: RA output:(xen-vm1:monitor:stderr)Jan 28 18:55:23 xenhost1 lrmd: [3822]: info: cancel_op: operationmonitor[71] on xen-vm1 for client 3825, its parameters:crm_feature_set=[3.0.6] xmfile=[/etc/xen/vm1.cfg]CRM_meta_name=[monitor] CRM_meta_interval=[30000]CRM_meta_timeout=[20000] cancelled

Jan 28 18:55:23 xenhost1 lrmd: [3822]: info: rsc:xen-vm1 stop[72] (pid 6219)

The machines are very low on resources, and this unnecessary migrationis causing problems.

The systems are running Debian Wheezy with pacemaker 1.1.7-1 andresource-agents 3.9.2-5+deb7u1. I don't know yet if there's a problemwith the Xen RA, the lrmd service itself or my configuration. I wasn'table to find any information related to this issue. Do you have any ideaof what could be causing this? Any help will be appreciated.


Regards,
Santiago

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[Pacemaker] Possible error in RA invocation

Reply via email to