Vladimir Sementsov-Ogievskiy <vsement...@yandex-team.ru> writes: > On 30.09.24 17:07, Andrey Drobyshev wrote: >> On 9/30/24 12:25 PM, Vladimir Sementsov-Ogievskiy wrote: >>> [add migration maintainers] >>> >>> On 24.09.24 15:56, Andrey Drobyshev wrote: >>>> [...] >>> >>> I doubt that this a correct way to go. >>> >>> As far as I understand, "inactive" actually means that "storage is not >>> belong to qemu, but to someone else (another qemu process for example), >>> and may be changed transparently". In turn this means that Qemu should >>> do nothing with inactive disks. So the problem is that nobody called >>> bdrv_activate_all on target, and we shouldn't ignore that. >>> >>> Hmm, I see in process_incoming_migration_bh() we do call >>> bdrv_activate_all(), but only in some scenarios. May be, the condition >>> should be less strict here. >>> >>> Why we need any condition here at all? Don't we want to activate >>> block-layer on target after migration anyway? >>> >> >> Hmm I'm not sure about the unconditional activation, since we at least >> have to honor LATE_BLOCK_ACTIVATE cap if it's set (and probably delay it >> in such a case). In current libvirt upstream I see such code: >> >>> /* Migration capabilities which should always be enabled as long as they >>> * are supported by QEMU. If the capability is supposed to be enabled on >>> both >>> * sides of migration, it won't be enabled unless both sides support it. >>> */ >>> static const qemuMigrationParamsAlwaysOnItem qemuMigrationParamsAlwaysOn[] >>> = { >>> {QEMU_MIGRATION_CAP_PAUSE_BEFORE_SWITCHOVER, >>> QEMU_MIGRATION_SOURCE}, >>> >>> >>> {QEMU_MIGRATION_CAP_LATE_BLOCK_ACTIVATE, >>> QEMU_MIGRATION_DESTINATION}, >>> }; >> >> which means that libvirt always wants LATE_BLOCK_ACTIVATE to be set. >> >> The code from process_incoming_migration_bh() you're referring to: >> >>> /* If capability late_block_activate is set: >>> * Only fire up the block code now if we're going to restart the >>> * VM, else 'cont' will do it. >>> * This causes file locking to happen; so we don't want it to happen >>> * unless we really are starting the VM. >>> */ >>> if (!migrate_late_block_activate() || >>> (autostart && (!global_state_received() || >>> runstate_is_live(global_state_get_runstate())))) { >>> /* Make sure all file formats throw away their mutable metadata. >>> * If we get an error here, just don't restart the VM yet. */ >>> bdrv_activate_all(&local_err); >>> if (local_err) { >>> error_report_err(local_err); >>> local_err = NULL; >>> autostart = false; >>> } >>> } >> >> It states explicitly that we're either going to start VM right at this >> point if (autostart == true), or we wait till "cont" command happens. >> None of this is going to happen if we start another migration while >> still being in PAUSED state. So I think it seems reasonable to take >> such case into account. For instance, this patch does prevent the crash: >> >>> diff --git a/migration/migration.c b/migration/migration.c >>> index ae2be31557..3222f6745b 100644 >>> --- a/migration/migration.c >>> +++ b/migration/migration.c >>> @@ -733,7 +733,8 @@ static void process_incoming_migration_bh(void *opaque) >>> */ >>> if (!migrate_late_block_activate() || >>> (autostart && (!global_state_received() || >>> - runstate_is_live(global_state_get_runstate())))) { >>> + runstate_is_live(global_state_get_runstate()))) || >>> + (!autostart && global_state_get_runstate() == RUN_STATE_PAUSED)) { >>> /* Make sure all file formats throw away their mutable metadata. >>> * If we get an error here, just don't restart the VM yet. */ >>> bdrv_activate_all(&local_err); >> >> What are your thoughts on it? >>
This bug is the same as https://gitlab.com/qemu-project/qemu/-/issues/2395 > > Hmmm... Don't we violate "late-block-activate" contract by this? > > Me go and check: > > # @late-block-activate: If enabled, the destination will not activate > # block devices (and thus take locks) immediately at the end of > # migration. (since 3.0) > > Yes, we'll violate it by this patch. So, for now the only exception is > when autostart is enabled, but libvirt correctly use > late-block-activate + !autostart. > > Interesting, when block layer is assumed to be activated.. Aha, only in > qmp_cont(). > > > So, what to do with this all: > > Either libvirt should not use late-block-activate for migration of > stopped vm. This way target would be automatically activated > > Or if libvirt still need postponed activation (I assume, for correctly > switching shared disks, etc), Libvirt should do some additional QMP > call. It can't be "cont", if we don't want to run the VM. So, > probably, we need additional "block-activate" QMP command for this. A third option might be to unconditionally activate in qmp_migrate: -- >8 -- >From 1890c7989e951a2702735f933d1567e48fa464a5 Mon Sep 17 00:00:00 2001 From: Fabiano Rosas <faro...@suse.de> Date: Wed, 9 Oct 2024 17:51:57 -0300 Subject: [PATCH] tmp --- migration/migration.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/migration/migration.c b/migration/migration.c index 021faee2f3..6bf1f039d1 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -2068,6 +2068,16 @@ static bool migrate_prepare(MigrationState *s, bool resume, Error **errp) return false; } + /* + * The VM might have been target of a previous migration. If it + * was in the paused state then nothing will have required the + * block layer to be activated. Do it now to ensure this QEMU + * instance owns the disk locks. + */ + if (!resume && runstate_check(RUN_STATE_PAUSED)) { + bdrv_activate_all(errp); + } + return true; } -- 2.35.3