Hi, I have a 2 node cluster (corosync 1.4.2 pacemaker 1.1.6). I need to control couple of virtual machines in this cluster and be able to live migrate them between nodes. Up until now all my tests worked, but as soon as I started using monitor action of VirtualDomain my virtual machines are failing to migrate and sometimes they don't even start cleanly. Every time I need to manually cleanup the resource group and then it seems it seems to work. Could you please explain if I need monitor action and how do I make it work.
thanks fil Here are the error messages I get: vm_test_monitor_10000 (node=server02.adriaticsolutions.com, call=46, rc=5, status=complete): not installed vm_test_start_0 (node=server01.adriaticsolutions.com, call=52, rc=5, status=complete): not installed Dec 4 23:17:44 server01 pengine: [11186]: notice: unpack_rsc_op: Hard error - vm_test_monitor_10000 failed with rc=5: Preventing vm_test from re-starting on server02.adriaticsolutions.com Dec 4 23:17:44 server01 pengine: [11186]: WARN: unpack_rsc_op: Processing failed op vm_test_monitor_10000 on server02.adriaticsolutions.com: not installed (5) Dec 4 23:17:44 server01 pengine: [11186]: notice: unpack_rsc_op: Hard error - vm_test_last_failure_0 failed with rc=5: Preventing vm_test from re-starting on server01.adriaticsolutions.com Dec 4 23:17:44 server01 pengine: [11186]: WARN: unpack_rsc_op: Processing failed op vm_test_last_failure_0 on server01.adriaticsolutions.com: not installed (5) Dec 4 23:17:44 server01 pengine: [11186]: info: native_print: vm_test#011(ocf::adriatic:VirtualDomain):#011Started server01.adriaticsolutions.com FAILED Dec 4 23:17:44 server01 pengine: [11186]: info: get_failcount: vm_test has failed 1 times on server02.adriaticsolutions.com Here is my config: ( I am using the latest VirtualDomain from the repo. That is why it is sitting in a different location...) node server01.adriaticsolutions.com \ attributes standby="off" node server02.adriaticsolutions.com \ attributes standby="off" primitive vm_test ocf:adriatic:VirtualDomain \ params config="/etc/libvirt/qemu/test.xml" hypervisor="qemu:///system" migration_transport="tcp" \ meta allow-migrate="true" \ op start interval="0" timeout="120s" \ op stop interval="0" timeout="120s" \ op migrate_from interval="0" timeout="120s" \ op migrate_to interval="0" timeout="120s" \ op monitor interval="10" timeout="30" depth="0" \ utilization cpu="1" hv_memory="1024" group test_grp vm_test \ meta target-role="Started" location cli-prefer-test_grp test_grp \ rule $id="cli-prefer-rule-test_grp" inf: #uname eq server02.adriaticsolutions.com location cli-prefer-vm_test vm_test \ rule $id="cli-prefer-rule-vm_test" inf: #uname eq server02.adriaticsolutions.com location loc_server02 test_grp 1000: server02.adriaticsolutions.com property $id="cib-bootstrap-options" \ dc-version="1.1.6-4.fc16-89678d4947c5bd466e2f31acd58ea4e1edb854d5" \ cluster-infrastructure="openais" \ expected-quorum-votes="2" \ stonith-enabled="false" \ no-quorum-policy="ignore" \ last-lrm-refresh="1323058516" _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org