On Wed, Apr 19, 2017 at 03:27:52PM +0800, Lan Tianyu wrote: > Hi All: > Sorry for later response. > On 2017年04月18日 17:04, Liu, Yi L wrote: > >> -----Original Message----- > >> From: Peter Xu [mailto:pet...@redhat.com] > >> Sent: Tuesday, April 18, 2017 3:27 PM > >> To: Liu, Yi L <yi.l....@intel.com> > >> Cc: qemu-devel@nongnu.org; Lan, Tianyu <tianyu....@intel.com>; Michael S . > >> Tsirkin <m...@redhat.com>; Jason Wang <jasow...@redhat.com>; Marcel > >> Apfelbaum <mar...@redhat.com>; David Gibson <da...@gibson.dropbear.id.au> > >> Subject: Re: [Qemu-devel] [PATCH v2 7/7] intel_iommu: support passthrough > >> (PT) > >> > >> On Tue, Apr 18, 2017 at 06:02:44AM +0000, Liu, Yi L wrote: > >>>>> Hi Peter, > >>>>> > >>>>> Skip address space switching is a good idea to support Passthru mode. > >>>>> However, without the address space, the vfio notifier would not be > >>>>> registered, thus vIOMMU emulator has no way to connect to host. It > >>>>> is no harm if there is only map/unmap notifier. But if we have > >>>>> more notifiers other than map/unmap, it may be a problem. > >>>>> > >>>>> I think we need to reconsider it here. > >>>> > >>>> For now I think as switching is good to us in general. Could I know > >>>> more context about this? Would it be okay to work on top of this in the > >>>> future? > >>>> > >>> > >>> Let me explain. For vSVM enabling, it needs to add new notifier to > >>> bind gPASID table to host. If an assigned device uses SVM in guest, as > >>> designed guest IOMMU driver would set "pt" for it. This is to make > >>> sure the host second-level page table stores GPA->HPA mapping. So that > >>> pIOMMU can do nested translation to achieve gVA->GPA GPA->HPA mapping. > >> > >> That should mean that in the guest it will only enable first level > >> translation, while in > >> the host it'll be nested (first + second)? Then you should be trapping the > >> guest > >> extended context entry invalidation, then pass the PASID table pointer > >> downward to > >> the host IOMMU, am I correct? > > > > exactly. > > > >> > >>> > >>> So the situation is if we want to keep GPA->HPA mapping, we should > >>> skip address space switch, but the vfio listener would not know vIOMMU > >>> is added, so no vfio notifier would be registered. But if we do the > >>> switch, the GPA->HPA mapping is unmapped. And need another way to build > >>> the > >> GPA->HPA mapping. > >> > >> If my understanding above is correct, current IOMMU notifier may not > >> satisfy your > >> need. If you see the current notify interface, it's delivering > >> IOMMUTLBEntry. I > >> suspect it only suites for IOTLB notifications, but not if you want to > >> pass some > >> pointer (one GPA) downward somewhere. > > > > Yeah, you got it more than absolutely. One of my patch is to modify the > > para to be > > void * and let each notifiers to translate separately. I think it should be > > a reasonable > > change. > > > > Supposedly, I would send RFC for vSVM soon. We may talk more it at that > > thread.
I suspect whether it'll be good that we hang everything under current IOMMU notifiers... But sure I'm looking forward to your RFC. :) > > > >>> > >>> I think we may have two choice here. Pls let me know your ideas. > >>> > >>> 1. skip the switch for "pt" mode, find other way to register vfio > >>> notifiers. not sure if this is workable. Just a quick thought. > >>> > >>> 2. do the switch, and rebuild GPA->HPA mapping if device use "pt" > >>> mode. For this option, I also have two way to build the GPA->HPA mapping. > >>> a) walk all the memory region sections in address_space_memory, and build > >>> the > >> mapping. > >>> address_space_memory.dispatch->map.sections, sections stores all the > >>> mapped > >> sections. > >>> > >>> b) let guest iommu driver mimics a 1:1 mapping for the devices use > >>> "pt" mode. in this way, the GPA->HPA mapping could be setup by walking > >>> the guest SL page table. This is consistent with your vIOVA replay > >>> solution. > >> > >> I'll prefer (1). Reason explained above. > >> > >>> > >>> Also I'm curious if Tianyu's fault report framework needs to register new > >>> notifiers. > >> > >> For Tianyu's case, I feel like we need other kind of notifier as well, but > >> not the > >> current IOTLB one. And, of course in this case it'll be in an reverted > >> direction, which > >> is from device to vIOMMU. > >> > >> (I am thinking whether it's a good time now to let any PCI device able to > >> fetch its > >> direct owner IOMMU object, then it can register anything it want onto > >> that object > >> maybe?) > >> > > Hmmm, let's see Tianyu's comment as he's driving fault report. Let's keep > > in touch on > > this Passthru-Mode enabling. > > In my previous RFC patchset of fault event reporting, I registered fault > notifier when there is a VFIO group attached to VFIO container and used > the address space to check whether vIOMMU is added. The address space is > returned by vtd_host_dma_iommu(). vtd_find_add_as() initializes device's > IOMMU memory region and put it into device's address space as root > memory region. > Does this make sense? > > @@ -1103,6 +1132,14 @@ static int vfio_connect_container(VFIOGroup > *group, AddressSpace *as, > goto listener_release_exit; > } > > + if (memory_region_is_iommu(container->space->as->root)) { I would suggest we don't play with as->root any more. After vtd vfio series, this may not be true if passthrough mode is enabled (then the root may be switched to an system memory alias). I don't know the best way to check this, one alternative might be that we check whether container->space->as == system_memory(), it should be workable, but in a slightly hackish way. Thanks, > + if (vfio_set_iommu_fault_notifier(container)) { > + error_setg_errno(errp, -ret, > + "Fail to set IOMMU fault notifier"); > + goto listener_release_exit; > + } > + } > + > container->initialized = true; > > -- > Best regards > Tianyu Lan -- Peter Xu