Hi Eric, > >>>> -----Original Message----- > >>>> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com] > >>>> Sent: Thursday, July 06, 2017 3:33 PM > >>>> To: Bharat Bhushan <bharat.bhus...@nxp.com>; Auger Eric > >>>> <eric.au...@redhat.com>; eric.auger....@gmail.com; > >>>> peter.mayd...@linaro.org; alex.william...@redhat.com; > >> m...@redhat.com; > >>>> qemu-...@nongnu.org; qemu-devel@nongnu.org > >>>> Cc: w...@redhat.com; kevin.t...@intel.com; marc.zyng...@arm.com; > >>>> t...@semihalf.com; will.dea...@arm.com; drjo...@redhat.com; > >>>> robin.mur...@arm.com; christoffer.d...@linaro.org > >>>> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device > >>>> > >>>> On 05/07/17 09:49, Bharat Bhushan wrote:>>> Also when setup > >>>> msi-route > >>>> kvm_irqchip_add_msi_route() we needed to > >>>>>> provide the translated address. > >>>>>>> According to my understanding this is required because kernel > >>>>>>> does no go > >>>>>> through viommu translation when generating interrupt, no? > >>>>>> > >>>>>> yes this is needed when KVM MSI routes are set up, ie. along with > >>>>>> GICV3 > >>>> ITS. > >>>>>> With GICv2M, qemu direct gsi mapping is used and this is not > needed. > >>>>>> > >>>>>> So I do not understand your previous sentence saying "MSI > >>>>>> interrupts works without any change". > >>>>> > >>>>> I have almost completed vfio integration with virtio-iommu and now > >>>>> testing the changes by assigning e1000 device to VM. For this I > >>>>> have changed virtio-iommu driver to use IOMMU_RESV_MSI rather > than > >>>>> sw- > >> msi > >>>>> and this does not need changed in vfio_get_addr() and > >>>>> kvm_irqchip_add_msi_route() > >>>> > >>>> I understand you're reserving region 0x08000000-0x08100000 as > >>>> IOMMU_RESV_MSI instead of IOMMU_RESV_SW_MSI? I think this > only > >> works > >>>> because Qemu places the vgic in that area as well (in hw/arm/virt.c). > >>>> It's not a coincidence if the addresses are the same, because Eric > >>>> chose them for the Linux SMMU drivers and I copied them. > >>>> > >>>> We can't rely on that behavior, though, it will break MSIs in > >>>> emulated devices. And if Qemu happens to move the MSI doorbell in > >>>> future machine revisions, then it would also break VFIO. > >>> > >>> Yes, make sense to me > >>> > >>>> > >>>> Just for my own understanding -- what happens, I think, is that in > >>>> Linux iova_reserve_iommu_regions initially reserves the > >>>> guest-physical doorbell 0x08000000-0x08100000. Then much later, > >>>> when the device driver requests an MSI, the irqchip driver calls > >>>> iommu_dma_map_msi_msg with the guest- physical gicv2m address > >>>> 0x08020000. The function finds the right page in msi_page_list, > >>>> which was added by cookie_init_hw_msi_region, therefore bypassing > >>>> the > >> viommu and the GPA gets written in the MSI-X table. > >>> > >>> This means in case tomorrow when qemu changes virt machine address > >> map and vgic-its (its-translator register address) address range does > >> not fall in the msi_page_list then it will allocate a new iova, > >> create mapping in iommu. So this will no longer be identity mapped > >> and fail to work with new qemu? > >>> > >> Yes that's correct. > >>>> > >>>> If an emulated device such as virtio-net-pci were to generate an > >>>> MSI, then Qemu would attempt to access the doorbell written by > >>>> Linux into the MSI-X table, 0x08020000, and fault because that > >>>> address wasn't mapped in the viommu. > >>>> > >>>> So for VFIO, you either need to translate the MSI-X entry using the > >>>> viommu, or just assume that the vaddr corresponds to the only MSI > >>>> doorbell accessible by this device (because how can we be certain > >>>> that the guest already mapped the doorbell before writing the > >>>> entry?) > >>>> > >>>> For ARM machines it's probably best to stick with > >> IOMMU_RESV_SW_MSI. > >>>> However, a nice way to use IOMMU_RESV_MSI would be for the virtio- > >>>> iommu device to advertise identity-mapped/reserved regions, and > >>>> bypass translation on these regions. Then the driver could reserve > >>>> those with IOMMU_RESV_MSI. > >>> > >>> Correct me if I did not understood you correctly, today iommu-driver > >> decides msi-reserved region, what if we change this and virtio-iommu > >> device will provide the reserved msi region as per the emulated machine > (virt/intel). > >> So virtio-iommu driver will use the address advertised by > >> virtio-iommu device as IOMMU_RESV_MSI. In this case msi-page-list > >> will always have the reserved region for MSI. > >>> On qemu side, for emulated devices we will let virtio-iommu return > >>> same > >> address as translated address as it falls in MSI-reserved page > >> already known to it. > >> > >> I think what you're proposing here corresponds to the 1st approach > >> that was followed for PCIe passthrough/MSI on ARM, ie. the userspace > >> was providing the reserved region base address & size. > >> This was ruled out and now this > >> region is arbitrarily set by the smmu-driver. At the moment this > >> means this region cannot contain guest RAM. > > > > In rejected proposal, user-space used to choose a reserve region and > provide that to Host Linux. Host Linux uses that MSI mapping. Just for my > understanding if tomorrow QEMU changes its address space then it may not > work without changing SMMU msi-reserved-iova in host driver, right? For > example if emulated machine have RAM at this address then it will not work? > Yes that's correct. Note MSI reserved regions are now exposed to userspace > through /sys/kernel/iommu_groups/<>/reserved_regions > > > > In this proposal, QEMU reserves a iova-range for guest (not host) and guest > kernel will use this as msi-iova untranslated (IOMMU_RESV_MSI). While this > does not change host interface and it will continue to use host reserved > mapping for actual interrupt generation, no? > But then userspace needs to provide IOMMU_RESV_MSI range to guest > kernel, right? What would be the proposed manner?
Just an opinion, we can define feature (VIRTIO_IOMMU_F_RES_MSI_RANGE) and provide this info via a command (VIRTIO_IOMMU_T_MSI_RANGE). Guest iommu-driver will make this call during initialization and store the value. This value will just replace MSI_IOVA_BASE and MSI_IOVA_LENGHT hash define. Rest will remain same in virtio-iommu driver. > Looks weird to me to > have different MSI handling on host and guest. Also I still don't get how you > handle the case where virtio-net-pci emits accesses to the MSI doorbell > while this latter is not mapped. I think I will look the code in detail for virtio-net-pci code, I was thinking of if we can handle this region differently or make 1:1 translation on virtio_iommu_translate() for this region. But my knowledge is limited around this piece of code. Is this possible or ugly ? Thanks -Bharat > > Thanks > > Eric > > > > Thanks > > -Bharat > > > >> > >>> > >>> > >>>> For x86 we will need such a system, with an added IRQ remapping > >>>> feature. > >>> > >>> I do not understand x86 MSI interrupt generation, but If above > >>> understand > >> is correct, then why we need IRQ remapping for x86? > >> To me x86 IR corresponds simply corresponds to the ITS MSI controller > >> modality on ARM. So as you still need vITS along with virtio-iommu on > >> ARM, you need vIRQ alongs with virtio-iommu on Intel. Does that make > sense? > >> > >> So in any case we need to make sure the guest uses a vITS or vIR to > >> make sure MSIs are correctly isolated. > >> > >> > >>> Will the x86 machine emulated in QEMU provides a big address range > >>> for > >> MSIs and when actually generating MSI it needed some extra processing > >> (IRQ-remapping processing) before actually generating write > >> transaction for MSI interrupt ? > >> My understanding is on x86, the MSI window is fixed and matches > >> [FEE0_0000h – FEF0_000h]. MSIs are conveyed on a separate address > >> space than usual DMA accesses. And yes they end up in IR if supported in > the HW. > >> > >> Thanks > >> > >> Eric > >>> > >>> Thanks > >>> -Bharat > >>> > >>>> > >>>> Thanks, > >>>> Jean