-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 10/03/2014 12:52 AM, Stefan Hajnoczi wrote: > On Thu, Oct 02, 2014 at 06:52:52PM +1000, Alexey Kardashevskiy wrote: >> When migrated using libvirt with "--copy-storage-all", at the end >> of migration there is race between NBD mirroring task trying to do >> flush and migration completion, both end up invalidating cache. >> Since qcow2 driver does not handle this situation very well, random >> crashes happen. >> >> This disables the BDRV_O_INCOMING flag for the block device being >> migrated and restores it when NBD task is done. >> >> Signed-off-by: Alexey Kardashevskiy <a...@ozlabs.ru> --- >> >> >> The commit log is not full and most likely incorrect as well as the >> patch :) Please, help. Thanks! >> >> The patch seems to fix the initial problem though. >> >> >> btw is there any easy way to migrate one QEMU to another using NBD >> (i.e. not using "migrate -b") and not using libvirt? What would the >> command line be? Debugging with libvirt is real pain :( >> >> >> --- block.c | 17 ++++------------- migration.c | 1 - nbd.c >> | 11 +++++++++++ 3 files changed, 15 insertions(+), 14 deletions(-) >> >> diff --git a/block.c b/block.c index c5a251c..ed72e0a 100644 --- >> a/block.c +++ b/block.c @@ -5073,6 +5073,10 @@ void >> bdrv_invalidate_cache_all(Error **errp) QTAILQ_FOREACH(bs, >> &bdrv_states, device_list) { AioContext *aio_context = >> bdrv_get_aio_context(bs); >> >> + if (!(bs->open_flags & BDRV_O_INCOMING)) { + >> continue; + } + > > We shouldn't touch bs before acquiring the AioContext. Acquiring the > AioContext is basically the "lock" for the BDS. > > It needs to be moved... > >> aio_context_acquire(aio_context); > > ...in here. > >> bdrv_invalidate_cache(bs, &local_err); >> aio_context_release(aio_context); @@ -5083,19 +5087,6 @@ void >> bdrv_invalidate_cache_all(Error **errp) } } >> >> -void bdrv_clear_incoming_migration_all(void) -{ - >> BlockDriverState *bs; - - QTAILQ_FOREACH(bs, &bdrv_states, >> device_list) { - AioContext *aio_context = >> bdrv_get_aio_context(bs); - - >> aio_context_acquire(aio_context); - bs->open_flags = >> bs->open_flags & ~(BDRV_O_INCOMING); - >> aio_context_release(aio_context); - } -} - int >> bdrv_flush(BlockDriverState *bs) { Coroutine *co; diff --git >> a/migration.c b/migration.c index 8d675b3..c49a05a 100644 --- >> a/migration.c +++ b/migration.c @@ -103,7 +103,6 @@ static void >> process_incoming_migration_co(void *opaque) } qemu_announce_self(); >> >> - bdrv_clear_incoming_migration_all(); /* Make sure all file >> formats flush their mutable metadata */ >> bdrv_invalidate_cache_all(&local_err); > > BDRV_O_INCOMING needs to be cleared, otherwise the block drivers will > think that incoming migration is still taking place and treat the > file as effectively read-only during open. > > On IRC I suggested doing away with the bdrv_invalidate_cache_all() > name since it no longer calls bdrv_invalidate_cache() on all BDSes. > > Combine both clearing BDRV_O_INCOMING and calling > bdrv_invalidate_cache() if BDRV_O_INCOMING was previously set into > one function - you could reuse bdrv_clear_incoming_migration_all() for > that.
I reread why bdrv_invalidate_cache() was added in the first place and what BDRV_O_INCOMING is, every time I look at the code it plays in new colors (not sure if it is a correct figure of speech :) ). BDRV_O_INCOMING is only set when QEMU is about to receive migration and we do not want QEMU to check the file at opening time as there is likely garbage. Is there any other use of BDRV_O_INCOMING? There must be some as bdrv_clear_incoming_migration_all() is called at the end of live migration and I believe there must be a reason for it (cache does not migrate, does it?). bdrv_invalidate_cache() flushes cache as it could be already initialized even if QEMU is receiving migration - QEMU could have cached some of real disk data. Is that correct? I do not really understand why it would happen if there is BDRV_O_INCOMING set but ok. I also thought that bdrv_invalidate_cache() is called more often as its name suggests, not during migration only, I was wrong here. The patch below then should solve the initial problem. bdrv_clear_incoming_migration_all() is unclear though... diff --git a/block.c b/block.c index c5a251c..6314af7 100644 - --- a/block.c +++ b/block.c @@ -5048,6 +5048,11 @@ void bdrv_invalidate_cache(BlockDriverState *bs, Error **errp) return; } + if (!(bs->open_flags & BDRV_O_INCOMING)) { + return; + } + bs->open_flags &= ~(BDRV_O_INCOMING); + if (bs->drv->bdrv_invalidate_cache) { bs->drv->bdrv_invalidate_cache(bs, &local_err); } else if (bs->file) { @@ -5083,19 +5088,6 @@ void bdrv_invalidate_cache_all(Error **errp) } } - -void bdrv_clear_incoming_migration_all(void) - -{ - - BlockDriverState *bs; - - - - QTAILQ_FOREACH(bs, &bdrv_states, device_list) { - - AioContext *aio_context = bdrv_get_aio_context(bs); - - - - aio_context_acquire(aio_context); - - bs->open_flags = bs->open_flags & ~(BDRV_O_INCOMING); - - aio_context_release(aio_context); - - } - -} - - int bdrv_flush(BlockDriverState *bs) { Coroutine *co; diff --git a/migration.c b/migration.c index 8d675b3..c49a05a 100644 - --- a/migration.c +++ b/migration.c @@ -103,7 +103,6 @@ static void process_incoming_migration_co(void *opaque) } qemu_announce_self(); - - bdrv_clear_incoming_migration_all(); /* Make sure all file formats flush their mutable metadata */ bdrv_invalidate_cache_all(&local_err); if (local_err) { diff --git a/nbd.c b/nbd.c index e9b539b..953c378 100644 - --- a/nbd.c +++ b/nbd.c @@ -972,6 +972,7 @@ NBDExport *nbd_export_new(BlockDriverState *bs, off_t dev_offset, exp->ctx = bdrv_get_aio_context(bs); bdrv_ref(bs); bdrv_add_aio_context_notifier(bs, bs_aio_attached, bs_aio_detach, exp); + bdrv_invalidate_cache(bs, NULL); return exp; } >> if (local_err) { diff --git a/nbd.c b/nbd.c index e9b539b..7b479c0 >> 100644 --- a/nbd.c +++ b/nbd.c @@ -106,6 +106,7 @@ struct NBDExport >> { off_t dev_offset; off_t size; uint32_t nbdflags; + bool >> restore_incoming; > > Not needed, it does not make sense to restore BDRV_O_INCOMING because > once we've written to a file it cannot be in use by another host at > the same time. > >> QTAILQ_HEAD(, NBDClient) clients; QTAILQ_ENTRY(NBDExport) next; >> >> @@ -972,6 +973,13 @@ NBDExport *nbd_export_new(BlockDriverState *bs, >> off_t dev_offset, exp->ctx = bdrv_get_aio_context(bs); >> bdrv_ref(bs); bdrv_add_aio_context_notifier(bs, bs_aio_attached, >> bs_aio_detach, exp); + + if (bs->open_flags & BDRV_O_INCOMING) { > > I think the flag has to be cleared before calling > bdrv_invalidate_cache() because the .bdrv_open() function looks at > the flag. > - -- Alexey -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAEBAgAGBQJULiIVAAoJEIYTPdgrwSC57ZIP+gI2ckqbMTf2LFl4MVP1oOEZ IkbZGiWJRfVsUitKD4+fjYB7ysnkt1hlrqdaaoTaPPzst676cGek91KdkpxqgorU L80e5TaRO/Ltx65yN1gMHhGJJnck9KvnzSp9atqUZNBGiMbYDPj8lN5NeyBadwdk 37uR/SvFrVRUWMVpgs7XF7q60C6YRnoGy+8qqmUSlwx3aYtFZpOfSyQYWkY5mT5j 3nsymm1ieFAQOd255K6X1oZW6GjwNCXa5MsUspK5gokPufjZQguGorRUMdBDmKa+ L9dvvMDG0I88027W3eTI0a2WI0D3BllizNuThacDHQEiLTIVGwd9VIDPYUJiz85W 7p88H6c8AozhUlcQiNK1o185HFCC+bcxlX0rHHRTLG+WfEWPMJ0Y4iewBvVa/IPZ Dzje5mEvdOm5MeU1/BiyTA6fc5nN5CpVth5mu7stMnnVvEAjK+gttKqTMMcR4BWa bzbREsrw97eqgGZcqH6T8X69Y9kEjgofdvs9uOlH4XlCD5CjJZNoylF8bzcwAFnO jlDNPAHCJATKLxIH6SClwaVf74lRn6Jv1WEKwKX9zuAq8rN34mSVT4VauEvRSsP0 RKnkqbUl37MZcf/dpiqr3M2W0pXDArvPZpme42IBJkTtpgFH5jC5Cry9OipOPqNc dFVR8sIgTQ4HLktVx4Fz =3bSl -----END PGP SIGNATURE-----