On Tue, 20 Oct 2020 11:24:58 +0200 Paolo Bonzini <pbonz...@redhat.com> wrote:
> On 19/10/20 21:02, Alex Williamson wrote: > >> For KVM we were thinking of changing the whole > >> memory map with a single ioctl, but that's much easier because KVM > >> builds its page tables lazily. It would be possible for the IOMMU too > >> but it would require a relatively complicated comparison of the old and > >> new memory maps in the kernel. > > > > We can only build IOMMU page tables lazily if we get faults, which we > > generally don't. We also cannot atomically update IOMMU page tables > > relative to a device, > > Yeah, I didn't mean building IOMMU page tables lazily, rather replacing > the whole VFIO memory map with a single ioctl. > > I don't think that requires atomic updates of the IOMMU page table root, > but it would require atomic updates of IOMMU page table entires; VFIO > would compare the old and new memory map and modify the page tables when > it sees a difference. Is that possible? Theoretically possible, probably. I imagine any IOMMU worth it's silicon should be able to update ptes atomically, but there's probably a substantial re-engineering of the internal IOMMU API and external vfio IOMMU uAPI to get there. For example, I don't expect we have support at the IOMMU driver level to reshuffle ptes when an IOMMU super page is broken. Instead, for an unmap operation, the IOMMU API allows the driver to return a larger number of unmapped pages than requested. I'd be nervous about an agreed baseline for modifications to pages covered by an IOMMU super page as well. The vfio IOMMU type1 uAPI is largely a reflection of this internal API with further restrictions for tracking and accounting of user mappings. Therefore we don't allow mappings that modify or overlap existing mappings nor do we allow an unmap which bisects any existing mapping. To support a memory map approach (which implicitly negates those sorts of rules) we'd need to know if the IOMMU driver itself can atomically handle arbitrary maps and unmaps, performing any necessary super page fix-ups atomically. The internal mechanics of the vfio IOMMU would need to change quite a bit too for tracking and pinning, I suspect. Do we necessarily need a memory map ioctl for this or could it be the QEMU code that compares the old and new maps to trigger map and unmap ioctls? For example (aiui) our race is that if we have contiguous memory regions A and B and flatview_simplify() tries to expand A and delete B we'll see a series of listener notifications deleting A and B and adding A'. But the vfio QEMU code could parse the memory map to determine that old A + B is functionally equivalent to A' and do nothing. Do you foresee any breakdowns for such an approach? Hotplug concerns me in that a new device only has the current simplified flatview, ex. we only know A' rather than A + B, so we can't get back to A + !B like a device with more history could. Thanks, Alex