On Fri, Jun 05, 2015 at 04:35:25PM +1000, Alexey Kardashevskiy wrote: > The existing implementation accounts the whole DMA window in > the locked_vm counter. This is going to be worse with multiple > containers and huge DMA windows. Also, real-time accounting would requite > additional tracking of accounted pages due to the page size difference - > IOMMU uses 4K pages and system uses 4K or 64K pages. > > Another issue is that actual pages pinning/unpinning happens on every > DMA map/unmap request. This does not affect the performance much now as > we spend way too much time now on switching context between > guest/userspace/host but this will start to matter when we add in-kernel > DMA map/unmap acceleration. > > This introduces a new IOMMU type for SPAPR - VFIO_SPAPR_TCE_v2_IOMMU. > New IOMMU deprecates VFIO_IOMMU_ENABLE/VFIO_IOMMU_DISABLE and introduces > 2 new ioctls to register/unregister DMA memory - > VFIO_IOMMU_SPAPR_REGISTER_MEMORY and VFIO_IOMMU_SPAPR_UNREGISTER_MEMORY - > which receive user space address and size of a memory region which > needs to be pinned/unpinned and counted in locked_vm. > New IOMMU splits physical pages pinning and TCE table update > into 2 different operations. It requires: > 1) guest pages to be registered first > 2) consequent map/unmap requests to work only with pre-registered memory. > For the default single window case this means that the entire guest > (instead of 2GB) needs to be pinned before using VFIO. > When a huge DMA window is added, no additional pinning will be > required, otherwise it would be guest RAM + 2GB. > > The new memory registration ioctls are not supported by > VFIO_SPAPR_TCE_IOMMU. Dynamic DMA window and in-kernel acceleration > will require memory to be preregistered in order to work. > > The accounting is done per the user process. > > This advertises v2 SPAPR TCE IOMMU and restricts what the userspace > can do with v1 or v2 IOMMUs. > > In order to support memory pre-registration, we need a way to track > the use of every registered memory region and only allow unregistration > if a region is not in use anymore. So we need a way to tell from what > region the just cleared TCE was from. > > This adds a userspace view of the TCE table into iommu_table struct. > It contains userspace address, one per TCE entry. The table is only > allocated when the ownership over an IOMMU group is taken which means > it is only used from outside of the powernv code (such as VFIO). > > As v2 IOMMU supports IODA2 and pre-IODA2 IOMMUs (which do not support > DDW API), this creates a default DMA window for IODA2 for consistency. > > Signed-off-by: Alexey Kardashevskiy <a...@ozlabs.ru> > [aw: for the vfio related changes] > Acked-by: Alex Williamson <alex.william...@redhat.com>
Reviewed-by: David Gibson <da...@gibson.dropbear.id.au> -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
pgpEhbh9smwC1.pgp
Description: PGP signature