On Mon, 16 May 2022 13:22:14 +0200 Juan Quintela <quint...@redhat.com> wrote:
> Avihai Horon <avih...@nvidia.com> wrote: > > Currently, if IOMMU of a VFIO container doesn't support dirty page > > tracking, migration is blocked completely. This is because a DMA-able > > VFIO device can dirty RAM pages without updating QEMU about it, thus > > breaking the migration. > > > > However, this doesn't mean that migration can't be done at all. If > > migration pre-copy phase is skipped, the VFIO device doesn't have a > > chance to dirty RAM pages that have been migrated already, thus > > eliminating the problem previously mentioned. > > > > Hence, in such case allow migration but skip pre-copy phase. > > > > Signed-off-by: Avihai Horon <avih...@nvidia.com> > > I don't know (TM). > Several issues: > - Patch is ugly as hell (ok, that depends on taste) > - It changes migration_iteration_run() instead of directly > migration_thread. > - There is already another case where we skip the sending of RAM > (localhost migration with shared memory) > > In migration/ram.c: > > static int ram_find_and_save_block(RAMState *rs, bool last_stage) > { > PageSearchStatus pss; > int pages = 0; > bool again, found; > > /* No dirty page as there is zero RAM */ > if (!ram_bytes_total()) { > return pages; > } > > This is the other place where we _don't_ send any RAM at all. > > I don't have a great idea about how to make things clear at a higher > level, I have to think about this. It seems like if we have devices dictating what type of migrations can be performed then there probably needs to be a switch to restrict use of such devices just as we have the -only-migratable switch now to prevent attaching devices that don't support migration. I'd guess that we need the switch to opt-in to allowing such devices to maintain compatibility. There's probably a whole pile of qapi things missing to expose this to management tools as well. Thanks, Alex > > --- > > hw/vfio/migration.c | 9 ++++++++- > > migration/migration.c | 5 +++++ > > migration/migration.h | 3 +++ > > 3 files changed, 16 insertions(+), 1 deletion(-) > > > > diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c > > index 21e8f9d4d4..d4b6653026 100644 > > --- a/hw/vfio/migration.c > > +++ b/hw/vfio/migration.c > > @@ -863,10 +863,17 @@ int vfio_migration_probe(VFIODevice *vbasedev, Error > > **errp) > > struct vfio_region_info *info = NULL; > > int ret = -ENOTSUP; > > > > - if (!vbasedev->enable_migration || !container->dirty_pages_supported) { > > + if (!vbasedev->enable_migration) { > > goto add_blocker; > > } > > > > + if (!container->dirty_pages_supported) { > > + warn_report( > > + "%s: IOMMU of the device's VFIO container doesn't support > > dirty page tracking, migration pre-copy phase will be skipped", > > + vbasedev->name); > > + migrate_get_current()->skip_precopy = true; > > + } > > + > > ret = vfio_get_dev_region_info(vbasedev, > > VFIO_REGION_TYPE_MIGRATION_DEPRECATED, > > > > VFIO_REGION_SUBTYPE_MIGRATION_DEPRECATED, > > diff --git a/migration/migration.c b/migration/migration.c > > index 5a31b23bd6..668343508d 100644 > > --- a/migration/migration.c > > +++ b/migration/migration.c > > @@ -3593,6 +3593,11 @@ static MigIterateState > > migration_iteration_run(MigrationState *s) > > uint64_t pending_size, pend_pre, pend_compat, pend_post; > > bool in_postcopy = s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE; > > > > + if (s->skip_precopy) { > > + migration_completion(s); > > + return MIG_ITERATE_BREAK; > > + } > > + > > qemu_savevm_state_pending(s->to_dst_file, s->threshold_size, &pend_pre, > > &pend_compat, &pend_post); > > pending_size = pend_pre + pend_compat + pend_post; > > diff --git a/migration/migration.h b/migration/migration.h > > index a863032b71..876713e7e1 100644 > > --- a/migration/migration.h > > +++ b/migration/migration.h > > @@ -332,6 +332,9 @@ struct MigrationState { > > * This save hostname when out-going migration starts > > */ > > char *hostname; > > + > > + /* Whether to skip pre-copy phase of migration or not */ > > + bool skip_precopy; > > }; > > > > void migrate_set_state(int *state, int old_state, int new_state); >