On Fri, May 01, 2015 at 05:12:45PM +1000, Alexey Kardashevskiy wrote: > On 05/01/2015 02:23 PM, David Gibson wrote: > >On Fri, May 01, 2015 at 02:01:17PM +1000, Alexey Kardashevskiy wrote: > >>On 04/29/2015 04:31 PM, David Gibson wrote: > >>>On Sat, Apr 25, 2015 at 10:14:50PM +1000, Alexey Kardashevskiy wrote: > >>>>In order to support memory pre-registration, we need a way to track > >>>>the use of every registered memory region and only allow unregistration > >>>>if a region is not in use anymore. So we need a way to tell from what > >>>>region the just cleared TCE was from. > >>>> > >>>>This adds a userspace view of the TCE table into iommu_table struct. > >>>>It contains userspace address, one per TCE entry. The table is only > >>>>allocated when the ownership over an IOMMU group is taken which means > >>>>it is only used from outside of the powernv code (such as VFIO). > >>>> > >>>>Signed-off-by: Alexey Kardashevskiy <a...@ozlabs.ru> > >>>>--- > >>>>Changes: > >>>>v9: > >>>>* fixed code flow in error cases added in v8 > >>>> > >>>>v8: > >>>>* added ENOMEM on failed vzalloc() > >>>>--- > >>>> arch/powerpc/include/asm/iommu.h | 6 ++++++ > >>>> arch/powerpc/kernel/iommu.c | 18 ++++++++++++++++++ > >>>> arch/powerpc/platforms/powernv/pci-ioda.c | 22 ++++++++++++++++++++-- > >>>> 3 files changed, 44 insertions(+), 2 deletions(-) > >>>> > >>>>diff --git a/arch/powerpc/include/asm/iommu.h > >>>>b/arch/powerpc/include/asm/iommu.h > >>>>index 7694546..1472de3 100644 > >>>>--- a/arch/powerpc/include/asm/iommu.h > >>>>+++ b/arch/powerpc/include/asm/iommu.h > >>>>@@ -111,9 +111,15 @@ struct iommu_table { > >>>> unsigned long *it_map; /* A simple allocation bitmap for > >>>> now */ > >>>> unsigned long it_page_shift;/* table iommu page size */ > >>>> struct iommu_table_group *it_table_group; > >>>>+ unsigned long *it_userspace; /* userspace view of the table */ > >>> > >>>A single unsigned long doesn't seem like enough. > >> > >>Why single? This is an array. > > > >As in single per page. > > > Sorry, I am not following you here. > It is per IOMMU page. MAP/UNMAP work with IOMMU pages which are fully backed > with either system page or a huge page. > > > > > >>>How do you know > >>>which process's address space this address refers to? > >> > >>It is a current task. Multiple userspaces cannot use the same > >>container/tables. > > > >Where is that enforced? > > > It is accessed from VFIO DMA map/unmap which are ioctls() to a container's > fd which is per a process.
Usually, but what enforces that. If you open a container fd, then fork(), and attempt to map from both parent and child, what happens? > Same for KVM - when it registers IOMMU groups in > KVM, fd's of opened IOMMU groups are passed there. Or I did not understand > the question... > > > >More to the point, that's a VFIO constraint, but it's here affecting > >the design of a structure owned by the platform code. > > Right. But keeping in mind KVM, I cannot think of any better design here. > > > >[snip] > >>>> static void pnv_pci_ioda_setup_opal_tce_kill(struct pnv_phb *phb, > >>>>@@ -2062,12 +2071,21 @@ static long pnv_pci_ioda2_create_table(struct > >>>>iommu_table_group *table_group, > >>>> int nid = pe->phb->hose->node; > >>>> __u64 bus_offset = num ? pe->tce_bypass_base : 0; > >>>> long ret; > >>>>+ unsigned long *uas, uas_cb = sizeof(*uas) * (window_size >> page_shift); > >>>>+ > >>>>+ uas = vzalloc(uas_cb); > >>>>+ if (!uas) > >>>>+ return -ENOMEM; > >>> > >>>I don't see why this is allocated both here as well as in > >>>take_ownership. > >> > >>Where else? The only alternative is vfio_iommu_spapr_tce but I really do not > >>want to touch iommu_table fields there. > > > >Well to put it another way, why isn't take_ownership calling create > >itself (or at least a common helper). > > I am trying to keep DDW stuff away from platform-oriented > arch/powerpc/kernel/iommu.c which main purpose is to implement > iommu_alloc()&co. It already has > > I'd rather move it_userspace allocation completely to vfio_iommu_spapr_tce > (should have done earlier, actually), would this be ok? Yeah, that makes more sense to me. > >Clearly the it_userspace table needs to have lifetime which matches > >the TCE table itself, so there should be a single function that marks > >the beginning of that joint lifetime. > > > No. it_userspace lives as long as the platform code does not control the > table. For IODA2 it is equal for the lifetime of the table, for IODA1/P5IOC2 > it is not. Right, I was imprecise. I was thinking of the ownership change as an end/beginning of lifetime even for IODA1, because the table has to be fully cleared at that point, even though it's not actually reallocated. > >>>Isn't this function used for core-kernel users of the > >>>iommu as well, in which case it shouldn't need the it_userspace. > >> > >> > >>No. This is an iommu_table_group_ops callback which calls what the platform > >>code calls (pnv_pci_create_table()) plus allocates this it_userspace thing. > >>The callback is only called from VFIO. > > > >Ok. > > > >As touched on above it seems more like this should be owned by VFIO > >code than the platform code. > > Agree now :) I'll move the allocation to VFIO. Thanks! > > -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
pgpOq1tc0irSw.pgp
Description: PGP signature
_______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev