On Wed, Oct 11, 2017 at 08:13:10PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilb...@redhat.com>
> 
> Hi,
>   This set attempts to make a race condition between migration and
> drive-mirror (and other block users) soluble by allowing the migration
> to be paused after the source qemu releases the block devices but
> before the serialisation of the device state.
> 
> The symptom of this failure, as reported by Wangjie, is a:
>    _co_do_pwritev: Assertion `!(bs->open_flags & 0x0800)' failed
> 
> and the source qemu dieing; so the problem is pretty nasty.
> This has only been seen on 2.9 onwards, but the theory is that
> prior to 2.9 it might have been happening anyway and we were
> perhaps getting unreported corruptions (lost writes); so this
> really needs fixing.
> 
> This flow came from discussions between Kevin and me, and we can't
> see a way of fixing it without exposing a new state to the management
> layer.
> 
> The flow is now:
> 
> (qemu) migrate_set_capability pause-before-device on
> (qemu) migrate -d ...
> (qemu) info migrate
> ...
> Migration status: pause-before-device
> ...
> << issue commands to clean up any block jobs>>
> 
> (qemu) migrate_continue pause-before-device
> (qemu) info migrate
> ...
> Migration status: completed

I'm curious why QEMU doesn't have enough info to clean up the block
jobs automatically ? What is the key thing that libvirt knows about
the block jobs, that QEMU is lacking ? If QEMU had the right info it
could do it automatically & avoid this extra lock-step synchronization
with libvirt.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Reply via email to