On Thu, Apr 30, 2015 at 06:25:25PM +1000, Paul Mackerras wrote: > On Thu, Apr 30, 2015 at 04:34:55PM +1000, David Gibson wrote: > > On Sat, Apr 25, 2015 at 10:14:52PM +1000, Alexey Kardashevskiy wrote: > > > We are adding support for DMA memory pre-registration to be used in > > > conjunction with VFIO. The idea is that the userspace which is going to > > > run a guest may want to pre-register a user space memory region so > > > it all gets pinned once and never goes away. Having this done, > > > a hypervisor will not have to pin/unpin pages on every DMA map/unmap > > > request. This is going to help with multiple pinning of the same memory > > > and in-kernel acceleration of DMA requests. > > > > > > This adds a list of memory regions to mm_context_t. Each region consists > > > of a header and a list of physical addresses. This adds API to: > > > 1. register/unregister memory regions; > > > 2. do final cleanup (which puts all pre-registered pages); > > > 3. do userspace to physical address translation; > > > 4. manage a mapped pages counter; when it is zero, it is safe to > > > unregister the region. > > > > > > Multiple registration of the same region is allowed, kref is used to > > > track the number of registrations. > > > > [snip] > > > +long mm_iommu_alloc(unsigned long ua, unsigned long entries, > > > + struct mm_iommu_table_group_mem_t **pmem) > > > +{ > > > + struct mm_iommu_table_group_mem_t *mem; > > > + long i, j; > > > + struct page *page = NULL; > > > + > > > + list_for_each_entry_rcu(mem, ¤t->mm->context.iommu_group_mem_list, > > > + next) { > > > + if ((mem->ua == ua) && (mem->entries == entries)) > > > + return -EBUSY; > > > + > > > + /* Overlap? */ > > > + if ((mem->ua < (ua + (entries << PAGE_SHIFT))) && > > > + (ua < (mem->ua + (mem->entries << PAGE_SHIFT)))) > > > + return -EINVAL; > > > + } > > > + > > > + mem = kzalloc(sizeof(*mem), GFP_KERNEL); > > > + if (!mem) > > > + return -ENOMEM; > > > + > > > + mem->hpas = vzalloc(entries * sizeof(mem->hpas[0])); > > > + if (!mem->hpas) { > > > + kfree(mem); > > > + return -ENOMEM; > > > + } > > > > So, I've thought more about this and I'm really confused as to what > > this is supposed to be accomplishing. > > > > I see that you need to keep track of what regions are registered, so > > you don't double lock or unlock, but I don't see what the point of > > actualy storing the translations in hpas is. > > > > I had assumed it was so that you could later on get to the > > translations in real mode when you do in-kernel acceleration. But > > that doesn't make sense, because the array is vmalloc()ed, so can't be > > accessed in real mode anyway. > > We can access vmalloc'd arrays in real mode using real_vmalloc_addr().
Ah, ok. > > I can't think of a circumstance in which you can use hpas where you > > couldn't just walk the page tables anyway. > > The problem with walking the page tables is that there is no guarantee > that the page you find that way is the page that was returned by the > gup_fast() we did earlier. Storing the hpas means that we know for > sure that the page we're doing DMA to is one that we have an elevated > page count on. > > Also, there are various points where a Linux PTE is made temporarily > invalid for a short time. If we happened to do a H_PUT_TCE on one cpu > while another cpu was doing that, we'd get a spurious failure returned > by the H_PUT_TCE. I think we want this explanation in the commit message. Anr/or in a comment somewhere, I'm not sure. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
pgpvdLlhle7Fo.pgp
Description: PGP signature
_______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev