On 07/07/2017 08:25, Bharat Bhushan wrote: > Hi Eric, > >> -----Original Message----- >> From: Auger Eric [mailto:eric.au...@redhat.com] >> Sent: Friday, July 07, 2017 2:47 AM >> To: Bharat Bhushan <bharat.bhus...@nxp.com>; Jean-Philippe Brucker >> <jean-philippe.bruc...@arm.com>; eric.auger....@gmail.com; >> peter.mayd...@linaro.org; alex.william...@redhat.com; m...@redhat.com; >> qemu-...@nongnu.org; qemu-devel@nongnu.org >> Cc: w...@redhat.com; kevin.t...@intel.com; marc.zyng...@arm.com; >> t...@semihalf.com; will.dea...@arm.com; drjo...@redhat.com; >> robin.mur...@arm.com; christoffer.d...@linaro.org >> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device >> >> Hi Bharat, >> >> On 06/07/2017 13:24, Bharat Bhushan wrote: >>> >>> >>>> -----Original Message----- >>>> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com] >>>> Sent: Thursday, July 06, 2017 3:33 PM >>>> To: Bharat Bhushan <bharat.bhus...@nxp.com>; Auger Eric >>>> <eric.au...@redhat.com>; eric.auger....@gmail.com; >>>> peter.mayd...@linaro.org; alex.william...@redhat.com; >> m...@redhat.com; >>>> qemu-...@nongnu.org; qemu-devel@nongnu.org >>>> Cc: w...@redhat.com; kevin.t...@intel.com; marc.zyng...@arm.com; >>>> t...@semihalf.com; will.dea...@arm.com; drjo...@redhat.com; >>>> robin.mur...@arm.com; christoffer.d...@linaro.org >>>> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device >>>> >>>> On 05/07/17 09:49, Bharat Bhushan wrote:>>> Also when setup msi-route >>>> kvm_irqchip_add_msi_route() we needed to >>>>>> provide the translated address. >>>>>>> According to my understanding this is required because kernel does >>>>>>> no go >>>>>> through viommu translation when generating interrupt, no? >>>>>> >>>>>> yes this is needed when KVM MSI routes are set up, ie. along with >>>>>> GICV3 >>>> ITS. >>>>>> With GICv2M, qemu direct gsi mapping is used and this is not needed. >>>>>> >>>>>> So I do not understand your previous sentence saying "MSI >>>>>> interrupts works without any change". >>>>> >>>>> I have almost completed vfio integration with virtio-iommu and now >>>>> testing the changes by assigning e1000 device to VM. For this I have >>>>> changed virtio-iommu driver to use IOMMU_RESV_MSI rather than sw- >> msi >>>>> and this does not need changed in vfio_get_addr() and >>>>> kvm_irqchip_add_msi_route() >>>> >>>> I understand you're reserving region 0x08000000-0x08100000 as >>>> IOMMU_RESV_MSI instead of IOMMU_RESV_SW_MSI? I think this only >> works >>>> because Qemu places the vgic in that area as well (in hw/arm/virt.c). >>>> It's not a coincidence if the addresses are the same, because Eric >>>> chose them for the Linux SMMU drivers and I copied them. >>>> >>>> We can't rely on that behavior, though, it will break MSIs in >>>> emulated devices. And if Qemu happens to move the MSI doorbell in >>>> future machine revisions, then it would also break VFIO. >>> >>> Yes, make sense to me >>> >>>> >>>> Just for my own understanding -- what happens, I think, is that in >>>> Linux iova_reserve_iommu_regions initially reserves the >>>> guest-physical doorbell 0x08000000-0x08100000. Then much later, when >>>> the device driver requests an MSI, the irqchip driver calls >>>> iommu_dma_map_msi_msg with the guest- physical gicv2m address >>>> 0x08020000. The function finds the right page in msi_page_list, which >>>> was added by cookie_init_hw_msi_region, therefore bypassing the >> viommu and the GPA gets written in the MSI-X table. >>> >>> This means in case tomorrow when qemu changes virt machine address >> map and vgic-its (its-translator register address) address range does not >> fall >> in the msi_page_list then it will allocate a new iova, create mapping in >> iommu. So this will no longer be identity mapped and fail to work with new >> qemu? >>> >> Yes that's correct. >>>> >>>> If an emulated device such as virtio-net-pci were to generate an MSI, >>>> then Qemu would attempt to access the doorbell written by Linux into >>>> the MSI-X table, 0x08020000, and fault because that address wasn't >>>> mapped in the viommu. >>>> >>>> So for VFIO, you either need to translate the MSI-X entry using the >>>> viommu, or just assume that the vaddr corresponds to the only MSI >>>> doorbell accessible by this device (because how can we be certain >>>> that the guest already mapped the doorbell before writing the entry?) >>>> >>>> For ARM machines it's probably best to stick with >> IOMMU_RESV_SW_MSI. >>>> However, a nice way to use IOMMU_RESV_MSI would be for the virtio- >>>> iommu device to advertise identity-mapped/reserved regions, and >>>> bypass translation on these regions. Then the driver could reserve >>>> those with IOMMU_RESV_MSI. >>> >>> Correct me if I did not understood you correctly, today iommu-driver >> decides msi-reserved region, what if we change this and virtio-iommu device >> will provide the reserved msi region as per the emulated machine >> (virt/intel). >> So virtio-iommu driver will use the address advertised by virtio-iommu device >> as IOMMU_RESV_MSI. In this case msi-page-list will always have the >> reserved region for MSI. >>> On qemu side, for emulated devices we will let virtio-iommu return same >> address as translated address as it falls in MSI-reserved page already known >> to it. >> >> I think what you're proposing here corresponds to the 1st approach that was >> followed for PCIe passthrough/MSI on ARM, ie. the userspace was providing >> the reserved region base address & size. >> This was ruled out and now this >> region is arbitrarily set by the smmu-driver. At the moment this means this >> region cannot contain guest RAM. > > In rejected proposal, user-space used to choose a reserve region and provide > that to Host Linux. Host Linux uses that MSI mapping. Just for my > understanding if tomorrow QEMU changes its address space then it may not work > without changing SMMU msi-reserved-iova in host driver, right? For example if > emulated machine have RAM at this address then it will not work? Yes that's correct. Note MSI reserved regions are now exposed to userspace through /sys/kernel/iommu_groups/<>/reserved_regions > > In this proposal, QEMU reserves a iova-range for guest (not host) and guest > kernel will use this as msi-iova untranslated (IOMMU_RESV_MSI). While this > does not change host interface and it will continue to use host reserved > mapping for actual interrupt generation, no? But then userspace needs to provide IOMMU_RESV_MSI range to guest kernel, right? What would be the proposed manner? Looks weird to me to have different MSI handling on host and guest. Also I still don't get how you handle the case where virtio-net-pci emits accesses to the MSI doorbell while this latter is not mapped.
Thanks Eric > > Thanks > -Bharat > >> >>> >>> >>>> For x86 we will need such a system, with an added IRQ remapping >>>> feature. >>> >>> I do not understand x86 MSI interrupt generation, but If above understand >> is correct, then why we need IRQ remapping for x86? >> To me x86 IR corresponds simply corresponds to the ITS MSI controller >> modality on ARM. So as you still need vITS along with virtio-iommu on ARM, >> you need vIRQ alongs with virtio-iommu on Intel. Does that make sense? >> >> So in any case we need to make sure the guest uses a vITS or vIR to make >> sure MSIs are correctly isolated. >> >> >>> Will the x86 machine emulated in QEMU provides a big address range for >> MSIs and when actually generating MSI it needed some extra processing >> (IRQ-remapping processing) before actually generating write transaction for >> MSI interrupt ? >> My understanding is on x86, the MSI window is fixed and matches >> [FEE0_0000h – FEF0_000h]. MSIs are conveyed on a separate address space >> than usual DMA accesses. And yes they end up in IR if supported in the HW. >> >> Thanks >> >> Eric >>> >>> Thanks >>> -Bharat >>> >>>> >>>> Thanks, >>>> Jean