On Wed, Sep 23, 2015 at 01:48:17PM +0100, Paul Carlton wrote: > > > On 22/09/15 16:44, Daniel P. Berrange wrote: > >On Tue, Sep 22, 2015 at 09:29:46AM -0600, Chris Friesen wrote: > >>>>There is also work on post-copy migration in QEMU. Normally with live > >>>>migration, the guest doesn't start executing on the target host until > >>>>migration has transferred all data. There are many workloads where that > >>>>doesn't work, as the guest is dirtying data too quickly, With post-copy > >>>>you > >>>>can start running the guest on the target at any time, and when it faults > >>>>on a missing page that will be pulled from the source host. This is > >>>>slightly more fragile as you risk loosing the guest entirely if the source > >>>>host dies before migration finally completes. It does guarantee that > >>>>migration will succeed no matter what workload is in the guest. This is > >>>>probably Nxxxx cycle material. > >>It seems to me that the ideal solution would be to start doing pre-copy > >>migration, then if that doesn't converge with the specified downtime value > >>then maybe have the option to just cut over to the destination and do a > >>post-copy migration of the remaining data. > >Yes, that is precisely what the QEMU developers working on this > >featue suggest we should do. The lazy page faulting on the target > >host has a performance hit on the guest, so you definitely need > >to give a little time for pre-copy to start off with, and then > >switch to post-copy once some benchmark is reached, or if progress > >info shows the transfer is not making progress. > > > >Regards, > >Daniel > I'd be a bit concerned about automatically switching to the post copy > mode. As Daniel commented perviously, if something goes wrong on the > source node the customer's instance could be lost. Many cloud operators > will want to control the use of this mode. As per my previous message > this could be something that could be set on or off by default but > provide a PUT operation on os-migration to update setting on for a > specific migration
NB, if you are concerned about the source host going down while migration is still taking place, you will loose the VM even with pre-copy mode too, since the VM will of course still be running on the source. The new failure scenario is essentially about the network connection between the source & host guest - if the network layer fails while post-copy is running, then you loose the VM. In some sense post-copy will reduce the window of failure, because it should ensure that the VM migration completes in a faster & finite amount of time. I think this is probably particularly important for host evacuation so the admin can guarantee to get all the VMs off a host in a reasonable amount of time. As such I don't think you need expose post-copy as a concept in the API, but I could see a nova.conf value to say whether use of post-copy was acceptable, so those who want to have stronger resilience against network failure can turn off post-copy. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev