On Wed, 25 Jan 2017 18:40:56 +0100 Paolo Bonzini <pbonz...@redhat.com> wrote:
> On 25/01/2017 18:36, Alex Williamson wrote: > >> You probably should also put a comment about why VFIO does *not* need to > >> keep a reference between vfio_dma_map and vfio_dma_unmap (which doesn't > >> sound easy to do either). Would any well-behaved guest invalidate the > >> IOMMU page tables before a memory hot-unplug? > > > > Hmm, we do take a reference in vfio_listener_region_add(), but this is > > of course to the iommu region not to the RAM region we're translating. > > In the non-vIOMMU case we would be holding a reference to the memory > > region backing a DMA mapping. I would expect a well behaved guest to > > evacuate DMA mappings targeting a hotplug memory region before it gets > > ejected, but how much do we want to rely on well behaved guests. > > It depends of what happens if they aren't. I think it's fine (see other > message), but taking a reference for each mapping entry isn't so easy > because the unmap case doesn't know the old memory region. If we held a reference to the memory region from the mapping path and walk the IOMMU page table to generate the unmap, then we really should get to the same original memory region, right? The vfio iommu notifier should only be mapping native page sizes of the IOMMU, 4k/2M/1G. The problem is that it's a lot of overhead to flush the entire address space that way vs the single invalidation Peter is trying to enable here. It's actually similar to how the type1 iommu works in the kernel though, we can unmap by iova because we ask the iommu for the iova->pfn translation in order to unpin the page. I do agree with your description in the other message about how things would work for a memory hot-unplug w/o unmap though, which does seem to imply that we don't need that reference. Thanks, Alex