On Wed, Sep 30, 2015 at 10:59:52AM +0200, Laurent Vivier wrote: > > > On 30/09/2015 04:13, David Gibson wrote: > > When we have guest visible IOMMUs, we allow notifiers to be registered > > which will be informed of all changes to IOMMU mappings. This is used by > > vfio to keep the host IOMMU mappings in sync with guest IOMMU mappings. > > > > However, unlike with a memory region listener, an iommu notifier won't be > > told about any mappings which already exist in the (guest) IOMMU at the > > time it is registered. This can cause problems if hotplugging a VFIO > > device onto a guest bus which had existing guest IOMMU mappings, but didn't > > previously have an VFIO devices (and hence no host IOMMU mappings). > > > > This adds a memory_region_iommu_replay() function to handle this case. It > > replays any existing mappings in an IOMMU memory region to a specified > > notifier. Because the IOMMU memory region doesn't internally remember the > > granularity of the guest IOMMU it has a small hack where the caller must > > specify a granularity at which to replay mappings. > > > > If there are finer mappings in the guest IOMMU these will be reported in > > the iotlb structures passed to the notifier which it must handle (probably > > causing it to flag an error). This isn't new - the VFIO iommu notifier > > must already handle notifications about guest IOMMU mappings too short > > for it to represent in the host IOMMU. > > > > Signed-off-by: David Gibson <da...@gibson.dropbear.id.au> > > --- > > include/exec/memory.h | 13 +++++++++++++ > > memory.c | 20 ++++++++++++++++++++ > > 2 files changed, 33 insertions(+) > > > > diff --git a/include/exec/memory.h b/include/exec/memory.h > > index 5baaf48..0f07159 100644 > > --- a/include/exec/memory.h > > +++ b/include/exec/memory.h > > @@ -583,6 +583,19 @@ void memory_region_notify_iommu(MemoryRegion *mr, > > void memory_region_register_iommu_notifier(MemoryRegion *mr, Notifier *n); > > > > /** > > + * memory_region_iommu_replay: replay existing IOMMU translations to > > + * a notifier > > + * > > + * @mr: the memory region to observe > > + * @n: the notifier to which to replay iommu mappings > > + * @granularity: Minimum page granularity to replay notifications for > > + * @is_write: Whether to treat the replay as a translate "write" > > + * through the iommu > > + */ > > +void memory_region_iommu_replay(MemoryRegion *mr, Notifier *n, > > + hwaddr granularity, bool is_write); > > + > > +/** > > * memory_region_unregister_iommu_notifier: unregister a notifier for > > * changes to IOMMU translation entries. > > * > > diff --git a/memory.c b/memory.c > > index ef87363..1b03d22 100644 > > --- a/memory.c > > +++ b/memory.c > > @@ -1403,6 +1403,26 @@ void > > memory_region_register_iommu_notifier(MemoryRegion *mr, Notifier *n) > > notifier_list_add(&mr->iommu_notify, n); > > } > > > > +void memory_region_iommu_replay(MemoryRegion *mr, Notifier *n, > > + hwaddr granularity, bool is_write) > > +{ > > + hwaddr addr; > > + IOMMUTLBEntry iotlb; > > + > > + for (addr = 0; addr < memory_region_size(mr); addr += granularity) { > > + iotlb = mr->iommu_ops->translate(mr, addr, is_write); > > in iotlb, there is an "address_mask", on spapr, it is copied from > "page_shift", which is SPAPR_TCE_PAGE_SHIFT (12 -> 4k). > > At a first glance, we would like to use it to scan the memory region, > but as granularity could be a greater value, I think it is a better choice.
Using address_mask doesn't quite work. *If* you start with an existing, valid translation, then you can use address mask to skip to the end of it - that might be a useful optimization in future, particularly if the guest IOMMU has variable page sizes. But if you start on an address that doesn't have a current valid translation in the IOMMU, then address_mask gets set to ~0, so it doesn't give you any information on where to try next for a valid mapping. That's what the granularity parameter is needed for. > But the question is: why the iotlb page_size is not equal to the > granularity given by VFIO_IOMMU_GET_INFO _IO ? Well, the iotlb page size is the page size from the *guest* iommu, whereas VFIO_IOMMU_GET_INFO tells you the page size of the *host* iommu. In practice, they'll probably be the same, at least on setups likely to work well with VFIO, but in theory they could be different. > > + if (iotlb.perm != IOMMU_NONE) { > > + n->notify(n, &iotlb); > > + } > > + > > + /* if (2^64 - MR size) < granularity, it's possible to get an > > + * infinite loop here. This should catch such a wraparound */ > > + if ((addr + granularity) < addr) { > > + break; > > + } > > + } > > +} > > + > > void memory_region_unregister_iommu_notifier(Notifier *n) > > { > > notifier_remove(n); > > > > As my question is not a bout this particular patch but on another > existing part, I can say: > > Reviewed-by: Laurent Vivier <lviv...@redhat.com> > -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
pgpGZAp3RQi0X.pgp
Description: PGP signature