On Mon, 16 May 2022 20:08:32 -0300 Jason Gunthorpe <j...@nvidia.com> wrote:
> On Mon, May 16, 2022 at 02:22:00PM -0600, Alex Williamson wrote: > > On Mon, 16 May 2022 13:22:14 +0200 > > Juan Quintela <quint...@redhat.com> wrote: > > > > > Avihai Horon <avih...@nvidia.com> wrote: > > > > Currently, if IOMMU of a VFIO container doesn't support dirty page > > > > tracking, migration is blocked completely. This is because a DMA-able > > > > VFIO device can dirty RAM pages without updating QEMU about it, thus > > > > breaking the migration. > > > > > > > > However, this doesn't mean that migration can't be done at all. If > > > > migration pre-copy phase is skipped, the VFIO device doesn't have a > > > > chance to dirty RAM pages that have been migrated already, thus > > > > eliminating the problem previously mentioned. > > > > > > > > Hence, in such case allow migration but skip pre-copy phase. > > > > > > > > Signed-off-by: Avihai Horon <avih...@nvidia.com> > > > > > > I don't know (TM). > > > Several issues: > > > - Patch is ugly as hell (ok, that depends on taste) > > > - It changes migration_iteration_run() instead of directly > > > migration_thread. > > > - There is already another case where we skip the sending of RAM > > > (localhost migration with shared memory) > > > > > > In migration/ram.c: > > > > > > static int ram_find_and_save_block(RAMState *rs, bool last_stage) > > > { > > > PageSearchStatus pss; > > > int pages = 0; > > > bool again, found; > > > > > > /* No dirty page as there is zero RAM */ > > > if (!ram_bytes_total()) { > > > return pages; > > > } > > > > > > This is the other place where we _don't_ send any RAM at all. > > > > > > I don't have a great idea about how to make things clear at a higher > > > level, I have to think about this. > > > > It seems like if we have devices dictating what type of migrations can > > be performed then there probably needs to be a switch to restrict use of > > such devices just as we have the -only-migratable switch now to prevent > > attaching devices that don't support migration. I'd guess that we need > > the switch to opt-in to allowing such devices to maintain > > compatibility. There's probably a whole pile of qapi things missing to > > expose this to management tools as well. Thanks, > > This is really intended to be a NOP from where things are now, as if > you use mlx5 live migration without a patch like this then it causes a > botched pre-copy since everything just ends up permanently dirty. > > If it makes more sense we could abort the pre-copy too - in the end > there will be dirty tracking so I don't know if I'd invest in a big > adventure to fully define non-dirty tracking migration. How is pre-copy currently "botched" without a patch like this? If it's simply that the pre-copy doesn't converge and the downtime constraints don't allow the VM to enter stop-and-copy, that's the expected behavior AIUI, and supports backwards compatibility with existing SLAs. I'm assuming that by setting this new skip_precopy flag that we're forcing the VM to move to stop-and-copy, regardless of any other SLA constraints placed on the migration. It seems like a better solution would be to expose to management tools that the VM contains a device that does not support the pre-copy phase so that downtime expectations can be adjusted. Thanks, Alex