Hi, Your idea to search the relevent VTDAddressSpace and call it's notifier will probably work. Next week I'll try to implement it (for now with the costly scan of each context). I still not sure if populating the MemoryRegion will suffice for hot plug vfio device but i'll try to look into it.
As far as I understand the memory_region_iommu_replay function, it still scans the whole 64bit address space, and therefore may hang the VM for a long time. Aviv. On Thu, May 26, 2016 at 11:58 PM Alex Williamson <alex.william...@redhat.com> wrote: > On Mon, 23 May 2016 11:53:42 -0600 > Alex Williamson <alex.william...@redhat.com> wrote: > > > On Sat, 21 May 2016 19:19:50 +0300 > > "Aviv B.D" <bd.a...@gmail.com> wrote: > > > > > From: "Aviv Ben-David" <bd.a...@gmail.com> > > > > > > > Some commentary about the changes necessary to achieve $SUBJECT would > > be nice here. > > > > > Signed-off-by: Aviv Ben-David <bd.a...@gmail.com> > > > --- > > > hw/i386/intel_iommu.c | 69 > ++++++++++++++++++++++++++++++++++++++++-- > > > hw/i386/intel_iommu_internal.h | 2 ++ > > > hw/vfio/common.c | 11 +++++-- > > > include/hw/i386/intel_iommu.h | 4 +++ > > > include/hw/vfio/vfio-common.h | 1 + > > > 5 files changed, 81 insertions(+), 6 deletions(-) > > > > > > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c > > > index 410f810..128ec7c 100644 > > > --- a/hw/i386/intel_iommu.c > > > +++ b/hw/i386/intel_iommu.c > > > @@ -43,6 +43,9 @@ static int vtd_dbgflags = VTD_DBGBIT(GENERAL) | > VTD_DBGBIT(CSR); > > > #define VTD_DPRINTF(what, fmt, ...) do {} while (0) > > > #endif > > > > > > +static int vtd_dev_to_context_entry(IntelIOMMUState *s, uint8_t > bus_num, > > > + uint8_t devfn, VTDContextEntry > *ce); > > > + > > > static void vtd_define_quad(IntelIOMMUState *s, hwaddr addr, uint64_t > val, > > > uint64_t wmask, uint64_t w1cmask) > > > { > > > @@ -126,6 +129,22 @@ static uint32_t > vtd_set_clear_mask_long(IntelIOMMUState *s, hwaddr addr, > > > return new_val; > > > } > > > > > > +static int vtd_get_did_dev(IntelIOMMUState *s, uint8_t bus_num, > uint8_t devfn, uint16_t * domain_id) > > > +{ > > > + VTDContextEntry ce; > > > + int ret_fr; > > > + > > > + assert(domain_id); > > > + > > > + ret_fr = vtd_dev_to_context_entry(s, bus_num, devfn, &ce); > > > + if (ret_fr){ > > > + return -1; > > > + } > > > + > > > + *domain_id = VTD_CONTEXT_ENTRY_DID(ce.hi); > > > + return 0; > > > +} > > > + > > > static uint64_t vtd_set_clear_mask_quad(IntelIOMMUState *s, hwaddr > addr, > > > uint64_t clear, uint64_t mask) > > > { > > > @@ -724,9 +743,6 @@ static int > vtd_dev_to_context_entry(IntelIOMMUState *s, uint8_t bus_num, > > > } > > > > > > if (!vtd_context_entry_present(ce)) { > > > - VTD_DPRINTF(GENERAL, > > > - "error: context-entry #%"PRIu8 "(bus #%"PRIu8 ") " > > > - "is not present", devfn, bus_num); > > > return -VTD_FR_CONTEXT_ENTRY_P; > > > } else if ((ce->hi & VTD_CONTEXT_ENTRY_RSVD_HI) || > > > (ce->lo & VTD_CONTEXT_ENTRY_RSVD_LO)) { > > > @@ -1033,18 +1049,58 @@ static void > vtd_iotlb_domain_invalidate(IntelIOMMUState *s, uint16_t domain_id) > > > &domain_id); > > > } > > > > > > +static void vtd_iotlb_page_invalidate_vfio(IntelIOMMUState *s, > uint16_t domain_id, > > > + hwaddr addr, uint8_t am) > > > +{ > > > + VFIOGuestIOMMU * giommu; > > > + > > > > VT-d parsing VFIO private data structures, nope this is not a good idea. > > > > > + QLIST_FOREACH(giommu, &(s->giommu_list), iommu_next){ > > > + VTDAddressSpace *vtd_as = container_of(giommu->iommu, > VTDAddressSpace, iommu); > > > > VT-d needs to keep track of its own address spaces and call the iommu > > notifier, it should have no visibility whatsoever that there are vfio > > devices attached. > > > > > + uint16_t vfio_domain_id; > > > + int ret = vtd_get_did_dev(s, pci_bus_num(vtd_as->bus), > vtd_as->devfn, &vfio_domain_id); > > > + int i=0; > > > + if (!ret && domain_id == vfio_domain_id){ > > > + IOMMUTLBEntry entry; > > > + > > > + /* do vfio unmap */ > > > + VTD_DPRINTF(GENERAL, "Remove addr 0x%"PRIx64 " mask %d", > addr, am); > > > + entry.target_as = NULL; > > > + entry.iova = addr & VTD_PAGE_MASK_4K; > > > + entry.translated_addr = 0; > > > + entry.addr_mask = ~VTD_PAGE_MASK(VTD_PAGE_SHIFT_4K + am); > > > + entry.perm = IOMMU_NONE; > > > + memory_region_notify_iommu(giommu->iommu, entry); > > > + > > > + /* do vfio map */ > > > + VTD_DPRINTF(GENERAL, "add addr 0x%"PRIx64 " mask %d", > addr, am); > > > + /* call to vtd_iommu_translate */ > > > + for (i = 0; i < (1 << am); i++, addr+=(1 << > VTD_PAGE_SHIFT_4K)){ > > > + IOMMUTLBEntry entry = > s->iommu_ops.translate(giommu->iommu, addr, IOMMU_NO_FAIL); > > > + if (entry.perm != IOMMU_NONE){ > > > + memory_region_notify_iommu(giommu->iommu, entry); > > > + } > > > + } > > > + } > > > + } > > > +} > > > + > > > I think I see what you're trying to do here, find the VTDAddressSpaces > that vfio knows about and determine which are affected by an > invalidation, but I think it really just represents a flaw in the VT-d > code not really matching real hardware. As I understand the real > hardware, per device translations use the bus:devfn to get the context > entry. The context entry contains both the domain_id and a pointer to > the page table. Note that VT-d requires that domain_id and page table > pointers are consistent for all entries, there cannot be two separate > context entries with the same domain_id that point to different page > tables. So really, VTDAddressSpace should be based at the domain, not > the context entry. Multiple context entries can point to the same > domain, and thus the same VTDAddressSpace. Doing so would make the > invalidation trivial, simply lookup the VTDAddressSpace based on the > domain_id, get the MemoryRegion, and fire off > memory_region_notify_iommu()s, which then get {un}mapped through vfio > without any direct interaction. Unfortunately I think that means that > intel-iommu needs some data structure surgery. > > With the current code, I think the best we could do would be to look > through every context for matching domain_ids. For each one, use that > bus:devfn to get the VTDAddressSpace and thus the MemoryRegion and > call a notify for each. That's not only an inefficient lookup, but > requires a notify per matching bus:devfn since each uses a separate > VTDAddressSpace when we should really only need a notify per domain_id. > > Also, I haven't figured it out yet, but I think we're also going to > need to actually populate the MemoryRegion rather than just use it to > send notifies, otherwise replay won't work, which means that a device > added to a domain with existing mappings wouldn't work. Thanks, > > Alex >