Hi David, 2013/12/19 David Vossel <dvos...@redhat.com>: > > ----- Original Message ----- >> From: "Kazunori INOUE" <kazunori.ino...@gmail.com> >> To: "pm" <pacemaker@oss.clusterlabs.org> >> Sent: Wednesday, December 18, 2013 4:56:20 AM >> Subject: [Pacemaker] Question about behavior of the post-failure during the >> migrate_to >> >> Hi, >> >> When a node crashed while VM resource was migrating, the VM started >> in two nodes. [1] >> Is this the designed behavior? >> >> [1] >> Stack: corosync >> Current DC: bl460g1n6 (3232261592) - partition with quorum >> Version: 1.1.11-0.4.ce5d77c.git.el6-ce5d77c >> 3 Nodes configured >> 8 Resources configured >> >> >> Online: [ bl460g1n6 bl460g1n8 ] >> OFFLINE: [ bl460g1n7 ] >> >> Full list of resources: >> >> prmDummy (ocf::pacemaker:Dummy): Started bl460g1n6 >> prmVM2 (ocf::heartbeat:VirtualDomain): Started bl460g1n8 >> >> >> # ssh bl460g1n6 virsh list --all >> Id Name State >> ---------------------------------------------------- >> 113 vm2 running >> >> # ssh bl460g1n8 virsh list --all >> Id Name State >> ---------------------------------------------------- >> 34 vm2 running >> >> >> [Steps to reproduce] >> 1) Before migrate : vm2 running on bl460g1n7 (DC) >> >> Stack: corosync >> Current DC: bl460g1n7 (3232261593) - partition with quorum >> Version: 1.1.11-0.4.ce5d77c.git.el6-ce5d77c >> 3 Nodes configured >> 8 Resources configured >> >> >> Online: [ bl460g1n6 bl460g1n7 bl460g1n8 ] >> >> Full list of resources: >> >> prmDummy (ocf::pacemaker:Dummy): Started bl460g1n7 >> prmVM2 (ocf::heartbeat:VirtualDomain): Started bl460g1n7 >> >> ...snip... >> >> 2) Migrate the VM resource, >> >> # crm resource move prmVM2 >> >> bl460g1n6 was selected to migration destination. >> >> Dec 18 14:11:36 bl460g1n7 crmd[6928]: notice: te_rsc_command: >> Initiating action 47: migrate_to prmVM2_migrate_to_0 on bl460g1n7 >> (local) >> Dec 18 14:11:36 bl460g1n7 lrmd[6925]: info: >> cancel_recurring_action: Cancelling operation prmVM2_monitor_10000 >> Dec 18 14:11:36 bl460g1n7 crmd[6928]: info: do_lrm_rsc_op: >> Performing key=47:5:0:ddf348fe-fbad-4abb-9a12-8250f71b075a >> op=prmVM2_migrate_to_0 >> Dec 18 14:11:36 bl460g1n7 lrmd[6925]: info: log_execute: >> executing - rsc:prmVM2 action:migrate_to call_id:33 >> Dec 18 14:11:36 bl460g1n7 crmd[6928]: info: process_lrm_event: >> LRM operation prmVM2_monitor_10000 (call=31, status=1, cib-update=0, >> confirmed=true) Cancelled >> Dec 18 14:11:36 bl460g1n7 VirtualDomain(prmVM2)[7387]: INFO: vm2: >> Starting live migration to bl460g1n6 (using remote hypervisor URI >> qemu+ssh://bl460g1n6/system ). >> >> 3) And then, before migrate_to is completed after "virsh migrate" >> in VirtualDomain was completed, I made bl460g1n7 crash. >> >> As a result, vm2 was running in bl460g1n6 already, but it was >> even started in bl460g1n8 by pacemaker. [1] > > Oh, wow. I see what is going on. If the migrate_to action fails, we actually > have to call stop on the target node. I believe we attempt handle these > "dangling migrations" already, but something about your situation must be > different. Can you please create a crm_report so we can have your pengine > files to test with? > > Creating a bug on bugs.clusterlabs.org to track this issue would also be a > good idea. The holidays are coming up and I could see this getting lost > otherwise. > > Thanks, > -- Vossel >
I opened Bugzilla about this. * http://bugs.clusterlabs.org/show_bug.cgi?id=5186 Although I attached crm_report to bugzilla, is this enough as information? > > > >> Dec 18 14:11:49 bl460g1n8 crmd[25981]: notice: process_lrm_event: >> LRM operation prmVM2_start_0 (call=31, rc=0, cib-update=28, >> confirmed=true) ok >> >> >> Best Regards, >> Kazunori INOUE >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org