On 22/06/2023 22:48, Joao Martins wrote: > Hey, > > This series introduces support for vIOMMU with VFIO device migration, > particurlarly related to how we do the dirty page tracking. > > Today vIOMMUs serve two purposes: 1) enable interrupt remaping 2) > provide dma translation services for guests to provide some form of > guest kernel managed DMA e.g. for nested virt based usage; (1) is specially > required for big VMs with VFs with more than 255 vcpus. We tackle both > and remove the migration blocker when vIOMMU is present provided the > conditions are met. I have both use-cases here in one series, but I am happy > to tackle them in separate series. > > As I found out we don't necessarily need to expose the whole vIOMMU > functionality in order to just support interrupt remapping. x86 IOMMUs > on Windows Server 2018[2] and Linux >=5.10, with qemu 7.1+ (or really > Linux guests with commit c40aaaac10 and since qemu commit 8646d9c773d8) > can instantiate a IOMMU just for interrupt remapping without needing to > be advertised/support DMA translation. AMD IOMMU in theory can provide > the same, but Linux doesn't quite support the IR-only part there yet, > only intel-iommu. > > The series is organized as following: > > Patches 1-5: Today we can't gather vIOMMU details before the guest > establishes their first DMA mapping via the vIOMMU. So these first four > patches add a way for vIOMMUs to be asked of their properties at start > of day. I choose the least churn possible way for now (as opposed to a > treewide conversion) and allow easy conversion a posteriori. As > suggested by Peter Xu[7], I have ressurected Yi's patches[5][6] which > allows us to fetch PCI backing vIOMMU attributes, without necessarily > tieing the caller (VFIO or anyone else) to an IOMMU MR like I > was doing in v3. > > Patches 6-8: Handle configs with vIOMMU interrupt remapping but without > DMA translation allowed. Today the 'dma-translation' attribute is > x86-iommu only, but the way this series is structured nothing stops from > other vIOMMUs supporting it too as long as they use > pci_setup_iommu_ops() and the necessary IOMMU MR get_attr attributes > are handled. The blocker is thus relaxed when vIOMMUs are able to toggle > the toggle/report DMA_TRANSLATION attribute. With the patches up to this set, > we've then tackled item (1) of the second paragraph. > > Patches 9-15: Simplified a lot from v2 (patch 9) to only track the complete > IOVA address space, leveraging the logic we use to compose the dirty ranges. > The blocker is once again relaxed for vIOMMUs that advertise their IOVA > addressing limits. This tackles item (2). So far I mainly use it with > intel-iommu, although I have a small set of patches for virtio-iommu per > Alex's suggestion in v2. > > Comments, suggestions welcome. Thanks for the review! >
By mistake, I've spuriously sent this a little too early. There's some styling errors in patch 1, 6 and 10. I've fixed the problems already, but I won't respin the series as I don't wanna patch bomb folks again. I will give at least a week or 2 before I do that. My apologies :/ Meanwhile, here's the diff of those fixes: diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index 989993e303a6..7fad59126215 100644 --- a/hw/i386/intel_iommu.c +++ b/hw/i386/intel_iommu.c @@ -3880,7 +3880,7 @@ static int vtd_iommu_get_attr(IOMMUMemoryRegion *iommu_mr, { hwaddr *max_iova = (hwaddr *)(uintptr_t) data; - *max_iova = MAKE_64BIT_MASK(0, s->aw_bits);; + *max_iova = MAKE_64BIT_MASK(0, s->aw_bits); break; } default: @@ -4071,8 +4071,9 @@ static int vtd_get_iommu_attr(PCIBus *bus, void *opaque, int32_t devfn, assert(0 <= devfn && devfn < PCI_DEVFN_MAX); vtd_as = vtd_find_add_as(s, bus, devfn, PCI_NO_PASID); - if (!vtd_as) - return -EINVAL; + if (!vtd_as) { + return -EINVAL; + } return memory_region_iommu_get_attr(&vtd_as->iommu, attr, data); } diff --git a/hw/pci/pci.c b/hw/pci/pci.c index 91ba6f0927a4..0cf000a9c1ff 100644 --- a/hw/pci/pci.c +++ b/hw/pci/pci.c @@ -2700,10 +2700,10 @@ AddressSpace *pci_device_iommu_address_space(PCIDevice *dev) pci_device_get_iommu_bus_devfn(dev, &bus, &iommu_bus, &devfn); if (!pci_bus_bypass_iommu(bus) && iommu_bus) { if (iommu_bus->iommu_fn) { - return iommu_bus->iommu_fn(bus, iommu_bus->iommu_opaque, devfn); + return iommu_bus->iommu_fn(bus, iommu_bus->iommu_opaque, devfn); } else if (iommu_bus->iommu_ops && iommu_bus->iommu_ops->get_address_space) { - return iommu_bus->iommu_ops->get_address_space(bus, + return iommu_bus->iommu_ops->get_address_space(bus, iommu_bus->iommu_opaque, devfn); } }