On 11/25/24 4:46 PM, Fabiano Rosas wrote:
> Currently a VM that has been target of a migration using
> late-block-activate will crash at the end of a new migration (with it
> as source) when releasing ownership of the disks due to the VM having
> never taken ownership of the disks in the first place.
> 
> The issue is that late-block-activate expects a qmp_continue command
> to be issued at some point on the destination VM after the migration
> finishes. If the user decides to never continue the VM, but instead
> issue a new migration, then bdrv_activate_all() will never be called
> and the assert will be reached:
> 
> bdrv_inactivate_recurse: Assertion `!(bs->open_flags &
> BDRV_O_INACTIVE)' failed.
> 
> Fix the issue by checking at the start of migration if the VM is
> paused and call bdrv_activate_all() before migrating. Even if the
> late-block-activate capability is not in play or if the VM has been
> paused manually, there is no harm calling that function again.
> 
> Signed-off-by: Fabiano Rosas <faro...@suse.de>
> ---
>  migration/migration.c | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index aedf7f0751..26af30137b 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -2029,6 +2029,25 @@ static bool migrate_prepare(MigrationState *s, bool 
> resume, Error **errp)
>          return false;
>      }
>  
> +    /*
> +     * The VM might have been target of a previous migration. If it
> +     * was in the paused state then nothing will have required the
> +     * block layer to be activated. Do it now to ensure this QEMU
> +     * instance owns the disk locks.
> +     */
> +    if (!resume && runstate_check(RUN_STATE_PAUSED)) {
> +        Error *local_err = NULL;
> +
> +        g_assert(bql_locked());
> +
> +        bdrv_activate_all(&local_err);
> +        if (local_err) {
> +            error_propagate(errp, local_err);
> +            return false;
> +        }
> +        s->block_inactive = false;
> +    }
> +
>      return true;
>  }
> 

Hi Fabiano,

Thans for the fix, I can confirm that on my setup (2 nodes with common
NFS share and VM's disk on the share) this patch does solve the issue.

Tested-by: Andrey Drobyshev <andrey.drobys...@virtuozzo.com>

Reply via email to