On Wed, Jan 25, 2017 at 01:09:47PM -0700, Alex Williamson wrote: > On Wed, 25 Jan 2017 20:42:19 +0100 > Paolo Bonzini <pbonz...@redhat.com> wrote: > > > On 25/01/2017 19:36, Alex Williamson wrote: > > >> It depends of what happens if they aren't. I think it's fine (see other > > >> message), but taking a reference for each mapping entry isn't so easy > > >> because the unmap case doesn't know the old memory region. > > > If we held a reference to the memory region from the mapping path and > > > walk the IOMMU page table to generate the unmap, then we really should > > > get to the same original memory region, right? The vfio iommu notifier > > > should only be mapping native page sizes of the IOMMU, 4k/2M/1G. The > > > problem is that it's a lot of overhead to flush the entire address > > > space that way vs the single invalidation Peter is trying to enable > > > here. It's actually similar to how the type1 iommu works in the kernel > > > though, we can unmap by iova because we ask the iommu for the iova->pfn > > > translation in order to unpin the page. > > > > But in the kernel you can trust the IOMMU page tables because you build > > them, here instead it's the guest's page tables that you'd walk, right? > > You cannot trust the guest. > > Yes, you're right, we're not shadowing the vt-d page tables, we're > working on the explicit invalidation model. So there could be > anything, or nothing in the page tables when we go to try to lookup the > unref. So clearly taking that reference without a shadow page table > would be the wrong approach. Thanks,
IIUC of above discussion, moving rcu read lock/unlock out of vfio_get_vaddr() would be the nicest approach here. Thanks to you both on helping verify and confirm the problem! -- peterx