* Andrea Arcangeli (aarca...@redhat.com) wrote: > On Thu, Jul 10, 2014 at 02:37:43PM +0100, Dr. David Alan Gilbert wrote: > > * Eric Blake (ebl...@redhat.com) wrote: > > > Is there any need for an > > > event telling libvirt that enough pre-copy has occurred to make a > > > postcopy worthwhile? > > > > I'm not sure that qemu knows much more than management does at that > > point; any such decision you can make based on an arbitrary cut off > > (i.e. migration is taking too long) or you could consider something > > based on some of the other stats that migration already exposes > > (like the dirty pages stats); if we've got any more stats that you > > need we can always expose them. > > > > Agreed; although we can just do that independently of this big patch set. > > It can be independent yes, but I think such event is needed (and once > we add such event I hope we can get rid of the polling libvirt is > doing for pure precopy too). > > I think for very large guests what should happen is a single _lazy_ > pass of precopy and then immediately postcopy. > > That's why I think an event that notifies libvirt when it should issue > the postcopy command is good, to be able to implement the single > _lazy_ pass and nothing more than that. > > qemu should stop precopy and the source guest just before sending the > event, so then libvirt can assign all storage to the destination just > before issuing the postcopy commmand. By the time the event has been > raised by qemu, the guest in the source qemu must never run > anymore. So it is actually the same event needed in pure precopy too > (except when using precopy+postcopy the "precopy complete" event will > fire much sooner). We'll still need a parameter to precopy to tell > qemu when precopy should stop.
That's an interesting different type of event; I think we probably have that first pass information but it's not part of the 'state' (i.e. whether it's started/completed/cancelled enum). > The single precopy lazy pass would consist of clearing the dirty > bitmap, starting precopy, then if any page is found dirty by the time > precopy tries to send it, we skip it. We only send those pages in > precopy that haven't been modified yet by the time we reach them in > precopy. > > Pages heavily modified will be sent purely through > postcopy. Ultimately postcopy will be a page sorting feature to > massively decrease the downtime latency, and to reduce to 2*ramsize > the maximum amount of data transferred on the network without having > to slow down the guest artificially. We'll also know exactly the > maximum time in advance that it takes to migrate a large host no > matter the load in it (2*ramsize divided by the network bandwidth > available at the migration time). It'll be totally deterministic, no > black magic slowdowns anymore. There is a trade off; killing the precopy does reduce network bandwidth, but the other side of it is that you would incur more postcopy round trips, so your average latency will probably increase. Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK