On Thu, Oct 20, 2016 at 10:11:15PM +0300, Aviv B.D. wrote: [...]
> > > +static void vtd_iotlb_page_invalidate_notify(IntelIOMMUState *s, > > > + uint16_t domain_id, hwaddr > > addr, > > > + uint8_t am) > > > +{ > > > > The logic of this function looks strange to me. > > > > > + IntelIOMMUNotifierNode *node; > > > + > > > + QLIST_FOREACH(node, &(s->notifiers_list), next) { > > > + VTDAddressSpace *vtd_as = node->vtd_as; > > > + uint16_t vfio_domain_id; > > > + int ret = vtd_get_did_dev(s, pci_bus_num(vtd_as->bus), > > vtd_as->devfn, > > > + &vfio_domain_id); > > > + if (!ret && domain_id == vfio_domain_id) { > > > + IOMMUTLBEntry entry; > > > + > > > + /* notify unmap */ > > > + if (node->notifier_flag & IOMMU_NOTIFIER_UNMAP) { > > > > First of all, if we are talking about VFIO, notifier_flag should > > always be MAP|UNMAP. So in that case, for newly mapped entries, looks > > like we will first send an UNMAP, then a MAP? > > > > You are correct, there is no valid reason to have notifier_flag other than > MAP|UNMAP > at least for VFIO. Not sure whether you know about upstream vhost work, vhost is going to support Intel IOMMU, in that case, it will only register to UNMAP notifier, not MAP. > I'm not sure if in the feature there won't be good reason to do otherwise, > so my > code support this scenario... My point is, I think you should not send this notify unconditionally. IMO the logic should be simpler here, like: foreach (page in invalidation range) { IOTLBEntry entry = m->iommu_ops.translate(); if (entry.perm != IOMMU_NONE && notifier_flag & NOTIIFER_MAP) { /* this is map, user requested MAP flag */ memory_region_iommu_notify(entry); } else if (entry.perm == IOMMU_NONE && notifier_flag & NOTIFIER_UNMAP) { /* this is unmap, user requested UNMAP */ entry = ... /* setup the entry */ memory_region_iommu_notify(entry); } } This is to follow your logic. I don't know whether this is efficient enough, maybe good for the first version. The problem is, when you call translate(), you will need to go over the page every time from root dir. A faster way may be: provide a function to walk specific address range. If you are going to implement the replay logic that Alex/David has mentioned, maybe that will help too (walk over the whole 64bit range). > > > > > + VTD_DPRINTF(GENERAL, "Remove addr 0x%"PRIx64 " mask %d", > > > + addr, am); > > > + entry.target_as = &address_space_memory; > > > + entry.iova = addr & VTD_PAGE_MASK_4K; > > > + entry.translated_addr = 0; > > > + entry.addr_mask = ~VTD_PAGE_MASK(VTD_PAGE_SHIFT_4K + > > am); > > > + entry.perm = IOMMU_NONE; > > > + memory_region_notify_iommu(&node->vtd_as->iommu, > > entry); > > > + } > > > + > > > + /* notify map */ > > > + if (node->notifier_flag & IOMMU_NOTIFIER_MAP) { > > > + hwaddr original_addr = addr; > > > + VTD_DPRINTF(GENERAL, "add addr 0x%"PRIx64 " mask %d", > > addr, am); > > > + while (addr < original_addr + (1 << am) * > > VTD_PAGE_SIZE) { > > > + /* call to vtd_iommu_translate */ > > > + IOMMUTLBEntry entry = s->iommu_ops.translate( > > > + > > &node->vtd_as->iommu, > > > + addr, > > > + IOMMU_NO_FAIL); > > > + if (entry.perm != IOMMU_NONE) { > > > + addr += entry.addr_mask + 1; > > > + memory_region_notify_iommu(&node->vtd_as->iommu, > > entry); > > > + } else { > > > + addr += VTD_PAGE_SIZE; > > > > IIUC, here is the point that we found "the page is gone" (so this is > > an UNMAP invalidation), and we should do memory_region_iommu_notify() > > for the whole area with IOMMU_NONE. Then we just quit the loop since > > continuous translate()s should fail as well if the first page is > > missing. > > > > Please correct if I am wrong. > > > > If I remember correctly I encounter a few cases where there was hole of > unmaped > memory in the middle of otherwise mapped pages. If I remember correctly it > was > with linux kernel 4.4, but I'm not sure. I see. Thanks. -- peterx