Re: [Qemu-devel] [RFC PATCH v3 3/3] VFIO Type1 IOMMU change: to support with iommu and without iommu

Neo Jia Fri, 13 May 2016 01:32:13 -0700

On Fri, May 13, 2016 at 07:45:14AM +0000, Tian, Kevin wrote:
> > From: Neo Jia [mailto:c...@nvidia.com]
> > Sent: Friday, May 13, 2016 3:42 PM
> > 
> > On Fri, May 13, 2016 at 03:30:27PM +0800, Jike Song wrote:
> > > On 05/13/2016 02:43 PM, Neo Jia wrote:
> > > > On Fri, May 13, 2016 at 02:22:37PM +0800, Jike Song wrote:
> > > >> On 05/13/2016 10:41 AM, Tian, Kevin wrote:
> > > >>>> From: Neo Jia [mailto:c...@nvidia.com] Sent: Friday, May 13,
> > > >>>> 2016 3:49 AM
> > > >>>>
> > > >>>>>
> > > >>>>>> Perhaps one possibility would be to allow the vgpu driver
> > > >>>>>> to register map and unmap callbacks.  The unmap callback
> > > >>>>>> might provide the invalidation interface that we're so far
> > > >>>>>> missing.  The combination of map and unmap callbacks might
> > > >>>>>> simplify the Intel approach of pinning the entire VM memory
> > > >>>>>> space, ie. for each map callback do a translation (pin) and
> > > >>>>>> dma_map_page, for each unmap do a dma_unmap_page and
> > > >>>>>> release the translation.
> > > >>>>>
> > > >>>>> Yes adding map/unmap ops in pGPU drvier (I assume you are
> > > >>>>> refering to gpu_device_ops as implemented in Kirti's patch)
> > > >>>>> sounds a good idea, satisfying both: 1) keeping vGPU purely
> > > >>>>> virtual; 2) dealing with the Linux DMA API to achive hardware
> > > >>>>> IOMMU compatibility.
> > > >>>>>
> > > >>>>> PS, this has very little to do with pinning wholly or
> > > >>>>> partially. Intel KVMGT has once been had the whole guest
> > > >>>>> memory pinned, only because we used a spinlock, which can't
> > > >>>>> sleep at runtime.  We have removed that spinlock in our
> > > >>>>> another upstreaming effort, not here but for i915 driver, so
> > > >>>>> probably no biggie.
> > > >>>>>
> > > >>>>
> > > >>>> OK, then you guys don't need to pin everything. The next
> > > >>>> question will be if you can send the pinning request from your
> > > >>>> mediated driver backend to request memory pinning like we have
> > > >>>> demonstrated in the v3 patch, function vfio_pin_pages and
> > > >>>> vfio_unpin_pages?
> > > >>>>
> > > >>>
> > > >>> Jike can you confirm this statement? My feeling is that we don't
> > > >>> have such logic in our device model to figure out which pages
> > > >>> need to be pinned on demand. So currently pin-everything is same
> > > >>> requirement in both KVM and Xen side...
> > > >>
> > > >> [Correct me in case of any neglect:)]
> > > >>
> > > >> IMO the ultimate reason to pin a page, is for DMA. Accessing RAM
> > > >> from a GPU is certainly a DMA operation. The DMA facility of most
> > > >> platforms, IGD and NVIDIA GPU included, is not capable of
> > > >> faulting-handling-retrying.
> > > >>
> > > >> As for vGPU solutions like Nvidia and Intel provide, the memory
> > > >> address region used by Guest for GPU access, whenever Guest sets
> > > >> the mappings, it is intercepted by Host, so it's safe to only pin
> > > >> the page before it get used by Guest. This probably doesn't need
> > > >> device model to change :)
> > > >
> > > > Hi Jike
> > > >
> > > > Just out of curiosity, how does the host intercept this before it
> > > > goes on the bus?
> > > >
> > >
> > > Hi Neo,
> > >
> > > [prologize if I mis-expressed myself, bad English ..]
> > >
> > > I was talking about intercepting the setting-up of GPU page tables,
> > > not the DMA itself.  For currently Intel GPU, the page tables are
> > > MMIO registers or simply RAM pages, called GTT (Graphics Translation
> > > Table), the writing event to an GTT entry from Guest, is always
> > > intercepted by Host.
> > 
> > Hi Jike,
> > 
> > Thanks for the details, one more question if the page tables are guest RAM, 
> > how do you
> > intercept it from host? I can see it get intercepted when it is in MMIO 
> > range.
> > 
> 
> We use page tracking framework, which is newly added to KVM recently,
> to mark RAM pages as read-only so write accesses are intercepted to 
> device model.


Yes, I am aware of that patchset from Guangrong. So far the interface are all
requiring struct *kvm, copied from https://lkml.org/lkml/2015/11/30/644

- kvm_page_track_add_page(): add the page to the tracking pool after
  that later specified access on that page will be tracked

- kvm_page_track_remove_page(): remove the page from the tracking pool,
  the specified access on the page is not tracked after the last user is
  gone

void kvm_page_track_add_page(struct kvm *kvm, gfn_t gfn,
                enum kvm_page_track_mode mode);
void kvm_page_track_remove_page(struct kvm *kvm, gfn_t gfn,
               enum kvm_page_track_mode mode);

Really curious how you are going to have access to the struct kvm *kvm, or you
are relying on the userfaultfd to track the write faults only as part of the
QEMU userfault thread?

Thanks,
Neo

> 
> Thanks
> Kevin

Re: [Qemu-devel] [RFC PATCH v3 3/3] VFIO Type1 IOMMU change: to support with iommu and without iommu

Reply via email to