On Thu, Jan 08, 2015 at 11:29:59AM +0000, Dr. David Alan Gilbert wrote: > * Daniel P. Berrange (berra...@redhat.com) wrote: > > On Thu, Jan 08, 2015 at 11:11:29AM +0000, Dr. David Alan Gilbert (git) > > wrote: > > > From: "Dr. David Alan Gilbert" <dgilb...@redhat.com> > > > > > > If the remote host, or networking dies during a migration, the socket can > > > be > > > waiting for a long timeout, and migration_cancel can't complete the cancel > > > for a long time (and you can't start a new one to somewhere else). > > > (Where 'long' is the TCP timeout, that's ~15 mins) > > > > > > This patch set uses the shutdown(2) syscall to unblock any write/sends > > > that > > > are in progress to let the migrate_cancel happen quickly. > > > > > > 1/3: socket shutdown - An updated patch from my postcopy world to > > > add a shut_down method on QEMUFile - only > > > for 'socket' (where the syscall is supported). > > > > > > 2/3: Handle bi-directional communication for fd migration > > > - A patch from Cristian Klein to use the socket > > > QEMUFile for FDs that are passed in, if the FDs > > > are sockets; this is needed so that libvirt > > > migrations can take advantage of the other > > > patches. > > > Again this patch (and its naming) come from the > > > postcopy world. > > > > > > 3/3: migration_cancel: shutdown migration socket > > > - A new patch that uses the shutdown in > > > migrate_fd_cancel > > > > > > > > > Note this does not fix the timeout if you try to migrate to an already > > > dead host; > > > the connect timeout is typically a much shorter 2 minutes anyway. > > > > In any libvirt managed setup, you'd need to address the connect timeout > > issue in libvirt instead, since libvirt always uses fd based migration. > > ie libvirt estabishes the connection & then passes the TCP Socket to > > QEMU. > > > > It should be possible for libvirt to use a non-blocking connect() > > call and catch use of its virDomainMigrateCancel APi to stop the > > connection attempt. > > Yes, although as I say I've not fixed the (must shorter) connect side timeout > on the QEMU side either. > > How does libvirt behave if you cancel a tunnelled migration with the network > dieing in the middle?
I think its just reliant on the TCP timeout - we don't have any of the clever use of 'shutdown' that you're adding here. I guess it could make sense for libvirt to use shutdown. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|