On 8/27/2010 at 03:22 PM, Michael Smith <msm...@cbnco.com> wrote: > Hi, > > I have a pacemaker setup using the Xen resource agent and I've found > something weird during migration: if a VM is in the middle of > live-migrating from node 1 to node 2, and I stop the resource in crm, > pacemaker forgets about the migration and immediately thinks the resource > is stopped, although it doesn't actually call the stop action. Meanwhile, > the migration continues and the VM ends up running on node 2.
I'd actually suggest opening a bug for that: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > This can cause problems: let's say you put both nodes into standby one > after the other. The cluster starts migrating a VM from node 1 to node 2, > then thinks it stops the resource when node 2 goes to standby, but the > migration continues and the VM is left running on node 2. > > Later when the nodes are brought out of standby, the cluster starts the VM > on node 1 and hoses the filesystem. > > Is there a way around this? I'm not sure there is a clean way to > abort a Xen live migration, but even if there were, the cluster isn't > calling any actions so there'd be no way to trigger the abort. I don't know offhand if there's a way around this, sorry. Anyone else? Regards, Tim > I've tried with op_defaults record-pending="false" and "true", and with > and without the monitor op on the Xen resource. Here's part of the log > from a run with record-pending="false" and the following Xen primitive: > > primitive vm-test2 ocf:heartbeat:Xen \ > meta allow-migrate="true" target-role="Started" \ > op monitor interval="10" \ > params xmfile="/etc/xen/vm/vm-test2" > > > Aug 26 15:55:49 xen-test1 pengine: [5147]: info: complex_migrate_reload: > Migrating vm-test2 from xen-test1 to xen-test2 > Aug 26 15:55:49 xen-test1 pengine: [5147]: notice: LogActions: Migrate > resource > vm-test2 (Started xen-test1 -> xen-test2) > Aug 26 15:55:52 xen-test1 pengine: [5147]: info: complex_migrate_reload: > Migrating vm-test2 from xen-test1 to xen-test2 > Aug 26 15:55:52 xen-test1 pengine: [5147]: notice: LogActions: Migrate > resource > vm-test2 (Started xen-test1 -> xen-test2) > Aug 26 15:55:58 xen-test1 lrmd: [5145]: info: rsc:vm-test2:40: migrate_to > > Aug 26 15:55:58 xen-test1 crmd: [5148]: info: te_rsc_command: Initiating > action > 27: migrate_to vm-test2_migrate_to_0 on xen-test1 (local) > > Aug 26 15:55:58 xen-test1 crmd: [5148]: info: process_lrm_event: LRM > operation vm-test2_monitor_10000 (call=39, status=1, cib-update=0, > confirmed=true) Cancelled > > Aug 26 15:55:58 xen-test1 Xen[17077]: [17109]: INFO: vm-test2: Starting xm > migrate to xen-test2 > > > # "crm resource stop vm-test2" was run at this point > > Aug 26 15:56:07 xen-test1 crmd: [5148]: info: abort_transition_graph: > need_abort:59 - Triggered transition abort (complete=0) : Non-status change > > Aug 26 15:56:07 xen-test1 cib: [5144]: info: log_data_element: cib:diff: + > <nvpair id="vm-test2-meta_attributes-target-role" name="target-role" > value="Stopped" __crm_diff_marker__="added:top" /> > > Aug 26 15:56:49 xen-test1 Xen[17077]: [17504]: INFO: vm-test2: xm migrate to > xen-test2 succeeded. > > > cluster-glue-1.0.5-0.5.1 > corosync-1.2.1-0.5.1 > pacemaker-1.1.2-0.2.1 > resource-agents-1.0.3-0.3.2 > > > Thanks, > Mike > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker