> -----Original Message----- > From: Tian, Kevin <kevin.t...@intel.com> > Sent: Thursday, April 4, 2019 10:58 AM > To: Peter Xu <pet...@redhat.com>; Elijah Shakkour > <elija...@mellanox.com> > Cc: Knut Omang <knut.om...@oracle.com>; Michael S. Tsirkin > <m...@redhat.com>; Alex Williamson <alex.william...@redhat.com>; > Marcel Apfelbaum <marcel.apfelb...@gmail.com>; Stefan Hajnoczi > <stefa...@gmail.com>; qemu-devel@nongnu.org > Subject: RE: QEMU and vIOMMU support for emulated VF passthrough to > nested (L2) VM > > > From: Peter Xu [mailto:pet...@redhat.com] > > Sent: Thursday, April 4, 2019 3:00 PM > > > > On Wed, Apr 03, 2019 at 10:10:35PM +0000, Elijah Shakkour wrote: > > > > [...] > > > > > > > > > > > You can also try to enable VT-d device log by appending: > > > > > > > > > > > > > > > > > > -trace enable="vtd_*" > > > > > > > > > > > > > > > > > > In case it dumps anything useful for you. > > > > > > > > > > > > Here is the relevant dump (dev 01:00.01 is my VF): > > > > > > " > > > > > > vtd_inv_desc_cc_device context invalidate device 01:00.01 > > > > > > vtd_ce_not_present Context entry bus 1 devfn 1 not present > > > > > > vtd_switch_address_space Device 01:00.1 switching address > > > > > > space (iommu > > > > > > enabled=1) vtd_ce_not_present Context entry bus 1 devfn 1 not > > > > > > present vtd_err Detected invalid context entry when trying to > > > > > > sync shadow page table > > > > > > > > > > These lines mean that the guest sent a device invalidation to > > > > > your VF but the IOMMU found that the device context entry is > missing. > > > > > > > > > > > vtd_iotlb_cc_update IOTLB context update bus 0x1 devfn 0x1 > > > > > > high > > > > > > 0x102 low 0x2d007003 gen 0 -> gen 2 > > > > > > vtd_err_dmar_slpte_resv_error iova > > > > > > 0xf08e7000 level 2 slpte 0x2a54008f7 > > > > > > > > > > This line should not exist in latest QEMU. Are you sure you're > > > > > using the latest QEMU? > > > > > > > > I moved now to QEMU 4.0 RC2. > > > > This is the what I get now: > > > > vtd_iotlb_cc_update IOTLB context update bus 0x1 devfn 0x1 high > > > > 0x102 > > low > > > > 0x2f007003 gen 0 -> gen 1 > > > > qemu-system-x86_64: vtd_iova_to_slpte: detected splte reserve > > > > non-zero iova=0xf0d29000, level=0x2slpte=0x29f6008f7) > > > > vtd_fault_disabled Fault processing disabled for context entry > > > > qemu-system-x86_64: vtd_iommu_translate: detected translation > > > > failure (dev=01:00:01, iova=0xf0d29000) Unassigned mem read > > 00000000f0d29000 > > > > > > > > I'm not familiar with vIOMMU registers, but I noticed that I must > > > > report snoop control support to Hyper-V (i.e. bit 7 in extended > > > > capability register > > of > > > > vIOMMU) in-order to satisfy IOMMU support for SRIOV. > > > > vIOMMU.ecap before 0xf00f5e > > > > vIOMMU.ecap after 0xf00fde > > > > But I see that vIOMMU doesn't really support snoop control. > > > > Could this be the problem that fails IOVA range check in this > > > > function vtd_iova_range_check()? > > > > > > Sorry, I meant the SLPTE reserved non-zero check failure in > > vtd_slpte_nonzero_rsvd() > > > And NOT IOVA range check failure (since range check didn't fail) > > > > Probably. Currently VT-d emulation does not support snooping control, > > and if you modify that ecap only you probably will encounter this > > problem because then the guest kernel will setup the SNP bit in the > > IOMMU page table entries which will violate the reserved bits in the > > emulation code then you can see these errors. > > > > Now talking about implementing the Snoop Control for Intel IOMMU for > > real (which corresponds to vt-d ecap bit 7) - I'd confess I'm not 100% > > clear on what does the "snooping" mean and what we need to do as an > > emulator. I'm quotting from spec: > > > > "Snoop behavior for a memory access (to a translation structure > > entry or access to the mapped page) specifies if the access is > > coherent (snoops the processor caches) or not." > > > > If it is only a capability showing that whether the hardware is > > capable of snooping processor caches, then I don't think we need to do > > much here as an emulator of VT-d simply because when we access the > > data we're still from the processor's side (because we're emulating > > the IOMMU behavior only) so the cache should always been coherent from > > the POV of guest vCPUs, just like how the processors provide cache > > coherence between two cores (so IMHO here the VT-d emulation code can > > be run on one core/thread, and the vcpu which runs the guest iommu > > driver can be run on another core/thread). If so, maybe we can simply > > declare support of that but we at least also need to remove the SNP > > bit from vtd_paging_entry_rsvd_field[] array to reflect that we > > understand that bit. > > > > CCing Alex and Kevin to see whether I'm misunderstanding or in case of > > any further input on the snooping support. > > > > for software DMA yes snoop is guaranteed since it's just CPU access. > > However for VFIO device i.e. hardware DMA, snoop should be reported > based on physical IOMMU capability. It's fine to report no snoop control on > vIOMMU (current state) even when it's physically supported. It just results > that L1 VMM must favor guest cache attributes instead of forcing WB in L1 > EPT when doing nested passthrough. However it's incorrect to report snoop > control on vIOMMU when physically it's not supported, otherwise L1 VMM > may force WB in L1 EPT and enable snoop field in vIOMMU 2nd level PTE with > assumption that hardware snoop is guaranteed (however it isn't). Then it > becomes a correctness issue. >
If my device is fully emulated, can I ignore the SNP bit in the SLPTE? What is the cost of ignoring it in such a case? What could go wrong? (I tried to ignore it and it seems that translations work for me now). > The thing is a bit tricky regarding to two VFIO devices which are under two > pIOMMUs with one supporting snoop and the other doesn't, which leaves us > two options: > > 1). create one vIOMMU without snoop control. Current state and safe. > 2). create two vIOMMUs with one supporting snoop and the other doesn't. > Then report VFIO device to vIOMMU based on matching snoop capability. > match hardware topology but adds extra config burden and footprint. > > Thanks, > Kevin