On Fri, May 13, 2016 at 07:45:14AM +0000, Tian, Kevin wrote: > > From: Neo Jia [mailto:c...@nvidia.com] > > Sent: Friday, May 13, 2016 3:42 PM > > > > On Fri, May 13, 2016 at 03:30:27PM +0800, Jike Song wrote: > > > On 05/13/2016 02:43 PM, Neo Jia wrote: > > > > On Fri, May 13, 2016 at 02:22:37PM +0800, Jike Song wrote: > > > >> On 05/13/2016 10:41 AM, Tian, Kevin wrote: > > > >>>> From: Neo Jia [mailto:c...@nvidia.com] Sent: Friday, May 13, > > > >>>> 2016 3:49 AM > > > >>>> > > > >>>>> > > > >>>>>> Perhaps one possibility would be to allow the vgpu driver > > > >>>>>> to register map and unmap callbacks. The unmap callback > > > >>>>>> might provide the invalidation interface that we're so far > > > >>>>>> missing. The combination of map and unmap callbacks might > > > >>>>>> simplify the Intel approach of pinning the entire VM memory > > > >>>>>> space, ie. for each map callback do a translation (pin) and > > > >>>>>> dma_map_page, for each unmap do a dma_unmap_page and > > > >>>>>> release the translation. > > > >>>>> > > > >>>>> Yes adding map/unmap ops in pGPU drvier (I assume you are > > > >>>>> refering to gpu_device_ops as implemented in Kirti's patch) > > > >>>>> sounds a good idea, satisfying both: 1) keeping vGPU purely > > > >>>>> virtual; 2) dealing with the Linux DMA API to achive hardware > > > >>>>> IOMMU compatibility. > > > >>>>> > > > >>>>> PS, this has very little to do with pinning wholly or > > > >>>>> partially. Intel KVMGT has once been had the whole guest > > > >>>>> memory pinned, only because we used a spinlock, which can't > > > >>>>> sleep at runtime. We have removed that spinlock in our > > > >>>>> another upstreaming effort, not here but for i915 driver, so > > > >>>>> probably no biggie. > > > >>>>> > > > >>>> > > > >>>> OK, then you guys don't need to pin everything. The next > > > >>>> question will be if you can send the pinning request from your > > > >>>> mediated driver backend to request memory pinning like we have > > > >>>> demonstrated in the v3 patch, function vfio_pin_pages and > > > >>>> vfio_unpin_pages? > > > >>>> > > > >>> > > > >>> Jike can you confirm this statement? My feeling is that we don't > > > >>> have such logic in our device model to figure out which pages > > > >>> need to be pinned on demand. So currently pin-everything is same > > > >>> requirement in both KVM and Xen side... > > > >> > > > >> [Correct me in case of any neglect:)] > > > >> > > > >> IMO the ultimate reason to pin a page, is for DMA. Accessing RAM > > > >> from a GPU is certainly a DMA operation. The DMA facility of most > > > >> platforms, IGD and NVIDIA GPU included, is not capable of > > > >> faulting-handling-retrying. > > > >> > > > >> As for vGPU solutions like Nvidia and Intel provide, the memory > > > >> address region used by Guest for GPU access, whenever Guest sets > > > >> the mappings, it is intercepted by Host, so it's safe to only pin > > > >> the page before it get used by Guest. This probably doesn't need > > > >> device model to change :) > > > > > > > > Hi Jike > > > > > > > > Just out of curiosity, how does the host intercept this before it > > > > goes on the bus? > > > > > > > > > > Hi Neo, > > > > > > [prologize if I mis-expressed myself, bad English ..] > > > > > > I was talking about intercepting the setting-up of GPU page tables, > > > not the DMA itself. For currently Intel GPU, the page tables are > > > MMIO registers or simply RAM pages, called GTT (Graphics Translation > > > Table), the writing event to an GTT entry from Guest, is always > > > intercepted by Host. > > > > Hi Jike, > > > > Thanks for the details, one more question if the page tables are guest RAM, > > how do you > > intercept it from host? I can see it get intercepted when it is in MMIO > > range. > > > > We use page tracking framework, which is newly added to KVM recently, > to mark RAM pages as read-only so write accesses are intercepted to > device model.
Yes, I am aware of that patchset from Guangrong. So far the interface are all requiring struct *kvm, copied from https://lkml.org/lkml/2015/11/30/644 - kvm_page_track_add_page(): add the page to the tracking pool after that later specified access on that page will be tracked - kvm_page_track_remove_page(): remove the page from the tracking pool, the specified access on the page is not tracked after the last user is gone void kvm_page_track_add_page(struct kvm *kvm, gfn_t gfn, enum kvm_page_track_mode mode); void kvm_page_track_remove_page(struct kvm *kvm, gfn_t gfn, enum kvm_page_track_mode mode); Really curious how you are going to have access to the struct kvm *kvm, or you are relying on the userfaultfd to track the write faults only as part of the QEMU userfault thread? Thanks, Neo > > Thanks > Kevin