On Thu, May 17, 2018 at 04:59:16PM +0800, Peter Xu wrote: > During IOVA page table walking, there is a special case when the PSI > covers one whole PDE (Page Directory Entry, which contains 512 Page > Table Entries) or more. In the past, we skip that entry and we don't > notify the IOMMU notifiers. This is not correct. We should send UNMAP > notification to registered UNMAP notifiers in this case. > > For UNMAP only notifiers, this might cause IOTLBs cached in the devices > even if they were already invalid. For MAP/UNMAP notifiers like > vfio-pci, this will cause stale page mappings. > > This special case doesn't trigger often, but it is very easy to be > triggered by nested device assignments, since in that case we'll > possibly map the whole L2 guest RAM region into the device's IOVA > address space (several GBs at least), which is far bigger than normal > kernel driver usages of the device (tens of MBs normally). > > Without this patch applied to L1 QEMU, nested device assignment to L2 > guests will dump some errors like: > > qemu-system-x86_64: VFIO_MAP_DMA: -17 > qemu-system-x86_64: vfio_dma_map(0x557305420c30, 0xad000, 0x1000, > 0x7f89a920d000) = -17 (File exists)
Could you add description of security implications if any and their severity? > Acked-by: Jason Wang <jasow...@redhat.com> > [peterx: rewrite the commit message] > Signed-off-by: Peter Xu <pet...@redhat.com> Cc stable here? For any other patches as well? > --- > hw/i386/intel_iommu.c | 42 ++++++++++++++++++++++++++++++------------ > 1 file changed, 30 insertions(+), 12 deletions(-) > > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c > index fb31de9416..b359efd6f9 100644 > --- a/hw/i386/intel_iommu.c > +++ b/hw/i386/intel_iommu.c > @@ -722,6 +722,15 @@ static int vtd_iova_to_slpte(VTDContextEntry *ce, > uint64_t iova, bool is_write, > > typedef int (*vtd_page_walk_hook)(IOMMUTLBEntry *entry, void *private); > > +static int vtd_page_walk_one(IOMMUTLBEntry *entry, int level, > + vtd_page_walk_hook hook_fn, void *private) > +{ > + assert(hook_fn); > + trace_vtd_page_walk_one(level, entry->iova, entry->translated_addr, > + entry->addr_mask, entry->perm); > + return hook_fn(entry, private); > +} > + > /** > * vtd_page_walk_level - walk over specific level for IOVA range > * > @@ -781,28 +790,37 @@ static int vtd_page_walk_level(dma_addr_t addr, > uint64_t start, > */ > entry_valid = read_cur | write_cur; > > + entry.target_as = &address_space_memory; > + entry.iova = iova & subpage_mask; > + entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur); > + entry.addr_mask = ~subpage_mask; > + > if (vtd_is_last_slpte(slpte, level)) { > - entry.target_as = &address_space_memory; > - entry.iova = iova & subpage_mask; > /* NOTE: this is only meaningful if entry_valid == true */ > entry.translated_addr = vtd_get_slpte_addr(slpte, aw); > - entry.addr_mask = ~subpage_mask; > - entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur); > if (!entry_valid && !notify_unmap) { > trace_vtd_page_walk_skip_perm(iova, iova_next); > goto next; > } > - trace_vtd_page_walk_one(level, entry.iova, entry.translated_addr, > - entry.addr_mask, entry.perm); > - if (hook_fn) { > - ret = hook_fn(&entry, private); > - if (ret < 0) { > - return ret; > - } > + ret = vtd_page_walk_one(&entry, level, hook_fn, private); > + if (ret < 0) { > + return ret; > } > } else { > if (!entry_valid) { > - trace_vtd_page_walk_skip_perm(iova, iova_next); > + if (notify_unmap) { > + /* > + * The whole entry is invalid; unmap it all. > + * Translated address is meaningless, zero it. > + */ > + entry.translated_addr = 0x0; > + ret = vtd_page_walk_one(&entry, level, hook_fn, private); > + if (ret < 0) { > + return ret; > + } > + } else { > + trace_vtd_page_walk_skip_perm(iova, iova_next); > + } > goto next; > } > ret = vtd_page_walk_level(vtd_get_slpte_addr(slpte, aw), iova, > -- > 2.17.0