On Thu, 25 Jun 2020 20:04:08 +0530 Kirti Wankhede <kwankh...@nvidia.com> wrote:
> On 6/25/2020 12:25 AM, Alex Williamson wrote: > > On Sun, 21 Jun 2020 01:51:22 +0530 > > Kirti Wankhede <kwankh...@nvidia.com> wrote: > > > >> Create mapped iova list when vIOMMU is enabled. For each mapped iova > >> save translated address. Add node to list on MAP and remove node from > >> list on UNMAP. > >> This list is used to track dirty pages during migration. > > > > This seems like a lot of overhead to support that the VM might migrate. > > Is there no way we can build this when we start migration, for example > > replaying the mappings at that time? Thanks, > > > > In my previous version I tried to go through whole range and find valid > iotlb, as below: > > + if (memory_region_is_iommu(section->mr)) { > + iotlb = address_space_get_iotlb_entry(container->space->as, > iova, > + true, > MEMTXATTRS_UNSPECIFIED); > > When mapping doesn't exist, qemu throws error as below: > > qemu-system-x86_64: vtd_iova_to_slpte: detected slpte permission error > (iova=0x0, level=0x3, slpte=0x0, write=1) > qemu-system-x86_64: vtd_iommu_translate: detected translation failure > (dev=00:03:00, iova=0x0) > qemu-system-x86_64: New fault is not recorded due to compression of faults My assumption would have been that we use the replay mechanism, which is known to work because we need to use it when we hot-add a device. We'd make use of iommu_notifier_init() to create a new handler for this purpose, then we'd walk our container->giommu_list and call memory_region_iommu_replay() for each. Peter, does this sound like the right approach to you? > Secondly, it iterates through whole range with IOMMU page size > granularity which is 4K, so it takes long time resulting in large > downtime. With this optimization, downtime with vIOMMU reduced > significantly. Right, but we amortize that overhead and the resulting bloat across the 99.9999% of the time that we're not migrating. I wonder if we could startup another thread to handle this when we enable dirty logging. We don't really need the result until we start processing the dirty bitmap, right? Also, if we're dealing with this many separate pages, shouldn't we be using a tree rather than a list to give us O(logN) rather than O(N)? > Other option I will try if I can check that if migration is supported > then only create this list. Wouldn't we still have problems if we start with a guest IOMMU domain with a device that doesn't support migration, hot-add a device that does support migration, then hot-remove the original device? Seems like our list would only be complete since the migration device was added. Thanks, Alex