Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device

Auger Eric Fri, 07 Jul 2017 00:27:30 -0700


On 07/07/2017 08:25, Bharat Bhushan wrote:
> Hi Eric,
> 
>> -----Original Message-----
>> From: Auger Eric [mailto:eric.au...@redhat.com]
>> Sent: Friday, July 07, 2017 2:47 AM
>> To: Bharat Bhushan <bharat.bhus...@nxp.com>; Jean-Philippe Brucker
>> <jean-philippe.bruc...@arm.com>; eric.auger....@gmail.com;
>> peter.mayd...@linaro.org; alex.william...@redhat.com; m...@redhat.com;
>> qemu-...@nongnu.org; qemu-devel@nongnu.org
>> Cc: w...@redhat.com; kevin.t...@intel.com; marc.zyng...@arm.com;
>> t...@semihalf.com; will.dea...@arm.com; drjo...@redhat.com;
>> robin.mur...@arm.com; christoffer.d...@linaro.org
>> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
>>
>> Hi Bharat,
>>
>> On 06/07/2017 13:24, Bharat Bhushan wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com]
>>>> Sent: Thursday, July 06, 2017 3:33 PM
>>>> To: Bharat Bhushan <bharat.bhus...@nxp.com>; Auger Eric
>>>> <eric.au...@redhat.com>; eric.auger....@gmail.com;
>>>> peter.mayd...@linaro.org; alex.william...@redhat.com;
>> m...@redhat.com;
>>>> qemu-...@nongnu.org; qemu-devel@nongnu.org
>>>> Cc: w...@redhat.com; kevin.t...@intel.com; marc.zyng...@arm.com;
>>>> t...@semihalf.com; will.dea...@arm.com; drjo...@redhat.com;
>>>> robin.mur...@arm.com; christoffer.d...@linaro.org
>>>> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device
>>>>
>>>> On 05/07/17 09:49, Bharat Bhushan wrote:>>> Also when setup msi-route
>>>> kvm_irqchip_add_msi_route() we needed to
>>>>>> provide the translated address.
>>>>>>> According to my understanding this is required because kernel does
>>>>>>> no go
>>>>>> through viommu translation when generating interrupt, no?
>>>>>>
>>>>>> yes this is needed when KVM MSI routes are set up, ie. along with
>>>>>> GICV3
>>>> ITS.
>>>>>> With GICv2M, qemu direct gsi mapping is used and this is not needed.
>>>>>>
>>>>>> So I do not understand your previous sentence saying "MSI
>>>>>> interrupts works without any change".
>>>>>
>>>>> I have almost completed vfio integration with virtio-iommu and now
>>>>> testing the changes by assigning e1000 device to VM. For this I have
>>>>> changed virtio-iommu driver to use IOMMU_RESV_MSI rather than sw-
>> msi
>>>>> and this does not need changed in vfio_get_addr()  and
>>>>> kvm_irqchip_add_msi_route()
>>>>
>>>> I understand you're reserving region 0x08000000-0x08100000 as
>>>> IOMMU_RESV_MSI instead of IOMMU_RESV_SW_MSI? I think this only
>> works
>>>> because Qemu places the vgic in that area as well (in hw/arm/virt.c).
>>>> It's not a coincidence if the addresses are the same, because Eric
>>>> chose them for the Linux SMMU drivers and I copied them.
>>>>
>>>> We can't rely on that behavior, though, it will break MSIs in
>>>> emulated devices. And if Qemu happens to move the MSI doorbell in
>>>> future machine revisions, then it would also break VFIO.
>>>
>>> Yes, make sense to me
>>>
>>>>
>>>> Just for my own understanding -- what happens, I think, is that in
>>>> Linux iova_reserve_iommu_regions initially reserves the
>>>> guest-physical doorbell 0x08000000-0x08100000. Then much later, when
>>>> the device driver requests an MSI, the irqchip driver calls
>>>> iommu_dma_map_msi_msg with the guest- physical gicv2m address
>>>> 0x08020000. The function finds the right page in msi_page_list, which
>>>> was added by cookie_init_hw_msi_region, therefore bypassing the
>> viommu and the GPA gets written in the MSI-X table.
>>>
>>> This means in case tomorrow when qemu changes virt machine address
>> map and vgic-its (its-translator register address) address range does not 
>> fall
>> in the msi_page_list then it will allocate a new iova, create mapping in
>> iommu. So this will no longer be identity mapped and fail to work with new
>> qemu?
>>>
>> Yes that's correct.
>>>>
>>>> If an emulated device such as virtio-net-pci were to generate an MSI,
>>>> then Qemu would attempt to access the doorbell written by Linux into
>>>> the MSI-X table, 0x08020000, and fault because that address wasn't
>>>> mapped in the viommu.
>>>>
>>>> So for VFIO, you either need to translate the MSI-X entry using the
>>>> viommu, or just assume that the vaddr corresponds to the only MSI
>>>> doorbell accessible by this device (because how can we be certain
>>>> that the guest already mapped the doorbell before writing the entry?)
>>>>
>>>> For ARM machines it's probably best to stick with
>> IOMMU_RESV_SW_MSI.
>>>> However, a nice way to use IOMMU_RESV_MSI would be for the virtio-
>>>> iommu device to advertise identity-mapped/reserved regions, and
>>>> bypass translation on these regions. Then the driver could reserve
>>>> those with IOMMU_RESV_MSI.
>>>
>>> Correct me if I did not understood you correctly, today iommu-driver
>> decides msi-reserved region, what if we change this and virtio-iommu device
>> will provide the reserved msi region as per the emulated machine 
>> (virt/intel).
>> So virtio-iommu driver will use the address advertised by virtio-iommu device
>> as IOMMU_RESV_MSI. In this case msi-page-list will always have the
>> reserved region for MSI.
>>> On qemu side, for emulated devices we will let virtio-iommu return same
>> address as translated address as it falls in MSI-reserved page already known
>> to it.
>>
>> I think what you're proposing here corresponds to the 1st approach that was
>> followed for PCIe passthrough/MSI on ARM, ie. the userspace was providing
>> the reserved region base address & size.
>>  This was ruled out and now this
>> region is arbitrarily set by the smmu-driver. At the moment this means this
>> region cannot contain guest RAM.
> 
> In rejected proposal, user-space used to choose a reserve region and provide 
> that to Host Linux. Host Linux uses that MSI mapping. Just for my 
> understanding if tomorrow QEMU changes its address space then it may not work 
> without changing SMMU msi-reserved-iova in host driver, right? For example if 
> emulated machine have RAM at this address then it will not work?
Yes that's correct. Note MSI reserved regions are now exposed to
userspace through /sys/kernel/iommu_groups/<>/reserved_regions
> 
> In this proposal, QEMU reserves a iova-range for guest (not host) and guest 
> kernel will use this as msi-iova untranslated (IOMMU_RESV_MSI). While this 
> does not change host interface and it will continue to use host reserved 
> mapping for actual interrupt generation, no?
But then userspace needs to provide IOMMU_RESV_MSI range to guest
kernel, right? What would be the proposed manner? Looks weird to me to
have different MSI handling on host and guest. Also I still don't get
how you handle the case where virtio-net-pci emits accesses to the MSI
doorbell while this latter is not mapped.


Thanks

Eric
> 
> Thanks
> -Bharat
> 
>>
>>>
>>>
>>>> For x86 we will need such a system, with an added IRQ remapping
>>>> feature.
>>>
>>> I do not understand x86 MSI interrupt generation, but If above understand
>> is correct, then why we need IRQ remapping for x86?
>> To me x86 IR corresponds simply corresponds to the ITS MSI controller
>> modality on ARM. So as you still need vITS along with virtio-iommu on ARM,
>> you need vIRQ alongs with virtio-iommu on Intel. Does that make sense?
>>
>> So in any case we need to make sure the guest uses a vITS or vIR to make
>> sure MSIs are correctly isolated.
>>
>>
>>> Will the x86 machine emulated in QEMU provides a big address range for
>> MSIs and when actually generating MSI it needed some extra processing
>> (IRQ-remapping processing) before actually generating write transaction for
>> MSI interrupt ?
>> My understanding is on x86, the MSI window is fixed and matches
>> [FEE0_0000h – FEF0_000h]. MSIs are conveyed on a separate address space
>> than usual DMA accesses. And yes they end up in IR if supported in the HW.
>>
>> Thanks
>>
>> Eric
>>>
>>> Thanks
>>> -Bharat
>>>
>>>>
>>>> Thanks,
>>>> Jean

Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device

Reply via email to