Hi all, There is feeling that race condition is possible during live migration of resources.
I put one node to standby mode, that made all resources migrate to another one. Virtual machines were successfully live-migrated, but then marked as FAILED almost immediately. Logs show some interesting details: ========= Jul 4 10:21:48 s01-1 VirtualDomain[22988]: INFO: mgmt01.c01.ttc.prague.cz.vds-ok.com: live migration to s01-0 succeeded. Jul 4 10:21:48 s01-1 lrmd: [7741]: info: RA output: (mgmt01.c01.ttc.prague.cz.vds-ok.com-vm:migrate_to:stdout) Domain mgmt01.c01.ttc.prague.cz.vds-ok.com has been undefined Jul 4 10:21:48 s01-0 VirtualDomain[4641]: INFO: mgmt01.c01.ttc.prague.cz.vds-ok.com: live migration from s01-1 succeeded. Jul 4 10:21:49 s01-0 lrmd: [1927]: info: RA output: (mgmt01.c01.ttc.prague.cz.vds-ok.com-vm:migrate_from:stderr) mgmt01.c01.ttc.prague.cz.vds-ok.com-vm is active on more than one node, returning the default value for <null> Jul 4 10:21:49 s01-1 crmd: [7744]: info: do_lrm_rsc_op: Performing key=110:695:0:7ae65826-5d35-41c0-945a-8336ecb0bc3c op=mgmt01.c01.ttc.prague.cz.vds-ok.com-vm_stop_0 ) Jul 4 10:21:49 s01-1 lrmd: [7741]: info: rsc:mgmt01.c01.ttc.prague.cz.vds-ok.com-vm:1006: stop Jul 4 10:21:49 s01-1 VirtualDomain[24062]: ERROR: Virtual domain mgmt01.c01.ttc.prague.cz.vds-ok.com has no state during stop operation, bailing out. Jul 4 10:21:49 s01-1 crmd: [7744]: info: process_lrm_event: LRM operation mgmt01.c01.ttc.prague.cz.vds-ok.com-vm_stop_0 (call=1006, rc=0, cib-update=1031, confirmed=true) ok ========= Note that line with "is active on more than one node" follows "migration from s01-1 succeeded" immediately in syslog (in both local and remote files), so it was put into syslog queue immediately after former one. >From what I understand, lrmd made decision to fail resource just because 'stop' operation was not yet run on another node. What else can it be if my feeling is wrong? Version of pacemaker is 'almost' 1.1-devel tip. cluster-glue is 1.0.7 I use own version of VirtualDomain RA, but it has the same migration logic as a stock one. Best, Vladislav _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker