On Wed, Dec 7, 2022 at 6:13 AM Peter Xu <pet...@redhat.com> wrote: > > It seems not super clear on when iova_tree is used, and why. Add a rich > comment above iova_tree to track why we needed the iova_tree, and when we > need it. > > Also comment for the map/unmap messages, on how they're used and > implications (e.g. unmap can be larger than the mapped ranges). > > Suggested-by: Jason Wang <jasow...@redhat.com> > Signed-off-by: Peter Xu <pet...@redhat.com> > --- > v3: > - Adjust according to Eric's comment > --- > include/exec/memory.h | 28 ++++++++++++++++++++++++++ > include/hw/i386/intel_iommu.h | 38 ++++++++++++++++++++++++++++++++++- > 2 files changed, 65 insertions(+), 1 deletion(-) > > diff --git a/include/exec/memory.h b/include/exec/memory.h > index 91f8a2395a..269ecb873b 100644 > --- a/include/exec/memory.h > +++ b/include/exec/memory.h > @@ -129,6 +129,34 @@ struct IOMMUTLBEntry { > /* > * Bitmap for different IOMMUNotifier capabilities. Each notifier can > * register with one or multiple IOMMU Notifier capability bit(s). > + * > + * Normally there're two use cases for the notifiers: > + * > + * (1) When the device needs accurate synchronizations of the vIOMMU page > + * tables, it needs to register with both MAP|UNMAP notifies (which > + * is defined as IOMMU_NOTIFIER_IOTLB_EVENTS below). > + * > + * Regarding to accurate synchronization, it's when the notified > + * device maintains a shadow page table and must be notified on each > + * guest MAP (page table entry creation) and UNMAP (invalidation) > + * events (e.g. VFIO). Both notifications must be accurate so that > + * the shadow page table is fully in sync with the guest view. > + * > + * (2) When the device doesn't need accurate synchronizations of the > + * vIOMMU page tables, it needs to register only with UNMAP or > + * DEVIOTLB_UNMAP notifies. > + * > + * It's when the device maintains a cache of IOMMU translations > + * (IOTLB) and is able to fill that cache by requesting translations > + * from the vIOMMU through a protocol similar to ATS (Address > + * Translation Service). > + * > + * Note that in this mode the vIOMMU will not maintain a shadowed > + * page table for the address space, and the UNMAP messages can be > + * actually larger than the real invalidations (just like how the > + * Linux IOMMU driver normally works, where an invalidation can be > + * enlarged as long as it still covers the target range). The IOMMU
Just spot this when testing your fix for DSI: assert(entry->iova >= notifier->start && entry_end <= notifier->end); Do we need to remove this (but it seems a partial revert of 03c7140c1a0336af3d4fca768de791b9c0e2b128)? Thanks > + * notifiee should be able to take care of over-sized invalidations. > */ > typedef enum { > IOMMU_NOTIFIER_NONE = 0, > diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h > index 46d973e629..89dcbc5e1e 100644 > --- a/include/hw/i386/intel_iommu.h > +++ b/include/hw/i386/intel_iommu.h > @@ -109,7 +109,43 @@ struct VTDAddressSpace { > QLIST_ENTRY(VTDAddressSpace) next; > /* Superset of notifier flags that this address space has */ > IOMMUNotifierFlag notifier_flags; > - IOVATree *iova_tree; /* Traces mapped IOVA ranges */ > + /* > + * @iova_tree traces mapped IOVA ranges. > + * > + * The tree is not needed if no MAP notifier is registered with current > + * VTD address space, because all guest invalidate commands can be > + * directly passed to the IOMMU UNMAP notifiers without any further > + * reshuffling. > + * > + * The tree OTOH is required for MAP typed iommu notifiers for a few > + * reasons. > + * > + * Firstly, there's no way to identify whether an PSI (Page Selective > + * Invalidations) or DSI (Domain Selective Invalidations) event is an > + * MAP or UNMAP event within the message itself. Without having prior > + * knowledge of existing state vIOMMU doesn't know whether it should > + * notify MAP or UNMAP for a PSI message it received when caching mode > + * is enabled (for MAP notifiers). > + * > + * Secondly, PSI messages received from guest driver can be enlarged in > + * range, covers but not limited to what the guest driver wanted to > + * invalidate. When the range to invalidates gets bigger than the > + * limit of a PSI message, it can even become a DSI which will > + * invalidate the whole domain. If the vIOMMU directly notifies the > + * registered device with the unmodified range, it may confuse the > + * registered drivers (e.g. vfio-pci) on either: > + * > + * (1) Trying to map the same region more than once (for > + * VFIO_IOMMU_MAP_DMA, -EEXIST will trigger), or, > + * > + * (2) Trying to UNMAP a range that is still partially mapped. > + * > + * That accuracy is not required for UNMAP-only notifiers, but it is a > + * must-to-have for notifiers registered with MAP events, because the > + * vIOMMU needs to make sure the shadow page table is always in sync > + * with the guest IOMMU pgtables for a device. > + */ > + IOVATree *iova_tree; > }; > > struct VTDIOTLBEntry { > -- > 2.37.3 >