Hi Andreas, bellow is the grep you requested. Also while looking into this problem I came up with some interesting issues with VirtualDomain resource agent. Since my /etc/libvirt/qemu directory is an NFS share VirtualDomain sometimes complains it can't read the /etc/libvirt/qemu/test.xml file. This is a bit puzzling. Looking at the test logic inside VirtualDomain file I ran into this code:
# check if we can read the config file (otherwise we're unable to # deduce $DOMAIN_NAME from it, see below) if [ ! -r $OCF_RESKEY_config ]; then if ocf_is_probe; then ocf_log info "Configuration file $OCF_RESKEY_config not readable during probe." else ocf_log error "Configuration file $OCF_RESKEY_config does not exist or is not readable." return $OCF_ERR_INSTALLED fi fi the problem here is that -r operator returns true if $OCF_RESKEY_config is a regular file or a directory. Shouldn't this be a -f check followed by the -r check? thanks fil Dec 07 01:25:53 server01.adriaticsolutions.com pengine: [5297]: info: native_print: vm_test (ocf::adriatic:VirtualDomain): Started server01.adriaticsolutions.com Dec 07 01:25:53 server01.adriaticsolutions.com lrmd: [5295]: info: cancel_op: operation monitor[10] on ocf::VirtualDomain::vm_test for client 5298, its parameters: CRM_meta_timeout=[30000] depth=[0] CRM_meta_name=[monitor] crm_feature_set=[3.0.5] config=[/etc/libvirt/qemu/test.xml] CRM_meta_interval=[10000] hypervisor=[qemu:///system] CRM_meta_depth=[0] migration_transport=[tcp] cancelled Dec 07 01:25:53 server01.adriaticsolutions.com lrmd: [5295]: debug: on_msg_perform_op: add an operation operation migrate_to[11] on ocf::VirtualDomain::vm_test for client 5298, its parameters: CRM_meta_timeout=[120000] CRM_meta_name=[migrate_to] crm_feature_set=[3.0.5] config=[/etc/libvirt/qemu/test.xml] CRM_meta_migrate_source=[server01.adriaticsolutions.com] CRM_meta_migrate_target=[server02.adriaticsolutions.com] hypervisor=[qemu:///system] migration_transport=[tcp] to the operation list. Dec 07 01:25:57 server01.adriaticsolutions.com lrmd: [5295]: debug: on_msg_perform_op: add an operation operation stop[12] on ocf::VirtualDomain::vm_test for client 5298, its parameters: crm_feature_set=[3.0.5] to the operation list. Dec 07 01:25:58 server01.adriaticsolutions.com pengine: [5297]: info: native_print: vm_test (ocf::adriatic:VirtualDomain): Started server02.adriaticsolutions.com FAILED Dec 07 01:26:10 server01.adriaticsolutions.com lrmd: [5295]: debug: on_msg_perform_op: add an operation operation start[13] on ocf::VirtualDomain::vm_test for client 5298, its parameters: crm_feature_set=[3.0.5] CRM_meta_name=[start] config=[/etc/libvirt/qemu/test.xml] migration_transport=[tcp] CRM_meta_timeout=[120000] hypervisor=[qemu:///system] to the operation list. Dec 07 01:26:11 server01.adriaticsolutions.com lrmd: [5295]: debug: on_msg_perform_op: add an operation operation monitor[14] on ocf::VirtualDomain::vm_test for client 5298, its parameters: CRM_meta_timeout=[30000] depth=[0] CRM_meta_name=[monitor] crm_feature_set=[3.0.5] config=[/etc/libvirt/qemu/test.xml] CRM_meta_interval=[10000] hypervisor=[qemu:///system] CRM_meta_depth=[0] migration_transport=[tcp] to the operation list. Dec 7 01:25:53 server01 pengine: [5297]: info: native_print: vm_test#011(ocf::adriatic:VirtualDomain):#011Started server01.adriaticsolutions.com Dec 7 01:25:53 server01 lrmd: [5295]: info: cancel_op: operation monitor[10] on ocf::VirtualDomain::vm_test for client 5298, its parameters: CRM_meta_timeout=[30000] depth=[0] CRM_meta_name=[monitor] crm_feature_set=[3.0.5] config=[/etc/libvirt/qemu/test.xml] CRM_meta_interval=[10000] hypervisor=[qemu:///system] CRM_meta_depth=[0] migration_transport=[tcp] cancelled Dec 7 01:25:53 server01 VirtualDomain[8680]: INFO: test: Starting live migration to server02.adriaticsolutions.com (using remote hypervisor URI qemu+tcp://server02.adriaticsolutions.com/system ). Dec 7 01:25:57 server01 VirtualDomain[8680]: INFO: test: live migration to server02.adriaticsolutions.com succeeded. Dec 7 01:25:57 server01 VirtualDomain[8725]: INFO: Domain name "test" saved to /var/run/heartbeat/rsctmp/VirtualDomain-vm_test.state. Dec 7 01:25:58 server01 VirtualDomain[8725]: INFO: Domain test already stopped. Dec 7 01:25:58 server01 pengine: [5297]: info: native_print: vm_test#011(ocf::adriatic:VirtualDomain):#011Started server02.adriaticsolutions.com FAILED On 12/06/2011 07:56 PM, Andreas Kurz wrote: > Hello, > > On 12/05/2011 05:27 AM, Fil wrote: >> Hi, >> >> I have a 2 node cluster (corosync 1.4.2 pacemaker 1.1.6). I need to >> control couple of virtual machines in this cluster and be able to live >> migrate them between nodes. Up until now all my tests worked, but as >> soon as I started using monitor action of VirtualDomain my virtual >> machines are failing to migrate and sometimes they don't even start >> cleanly. Every time I need to manually cleanup the resource group and >> then it seems it seems to work. Could you please explain if I need >> monitor action and how do I make it work. >> >> thanks >> fil >> >> Here are the error messages I get: >> >> vm_test_monitor_10000 (node=server02.adriaticsolutions.com, call=46, >> rc=5, status=complete): not installed >> vm_test_start_0 (node=server01.adriaticsolutions.com, call=52, rc=5, >> status=complete): not installed > > Any reslust when doing a grep for "VirtualDomain"? Would be interesting > what the resource agents is telling us ... > > Regards, > Andreas > > > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org