[sorry for the delay, I was out last week]

On 10/06/2024 17:53, Cédric Le Goater wrote:
> On 6/7/24 5:10 PM, Joao Martins wrote:
>> On 06/06/2024 16:43, Cédric Le Goater wrote:
>>> Hello Joao,
>>>
>>> On 6/22/23 23:48, Joao Martins wrote:
>>>> Hey,
>>>>
>>>> This series introduces support for vIOMMU with VFIO device migration,
>>>> particurlarly related to how we do the dirty page tracking.
>>>>
>>>> Today vIOMMUs serve two purposes: 1) enable interrupt remaping 2)
>>>> provide dma translation services for guests to provide some form of
>>>> guest kernel managed DMA e.g. for nested virt based usage; (1) is specially
>>>> required for big VMs with VFs with more than 255 vcpus. We tackle both
>>>> and remove the migration blocker when vIOMMU is present provided the
>>>> conditions are met. I have both use-cases here in one series, but I am 
>>>> happy
>>>> to tackle them in separate series.
>>>>
>>>> As I found out we don't necessarily need to expose the whole vIOMMU
>>>> functionality in order to just support interrupt remapping. x86 IOMMUs
>>>> on Windows Server 2018[2] and Linux >=5.10, with qemu 7.1+ (or really
>>>> Linux guests with commit c40aaaac10 and since qemu commit 8646d9c773d8)
>>>> can instantiate a IOMMU just for interrupt remapping without needing to
>>>> be advertised/support DMA translation. AMD IOMMU in theory can provide
>>>> the same, but Linux doesn't quite support the IR-only part there yet,
>>>> only intel-iommu.
>>>>
>>>> The series is organized as following:
>>>>
>>>> Patches 1-5: Today we can't gather vIOMMU details before the guest
>>>> establishes their first DMA mapping via the vIOMMU. So these first four
>>>> patches add a way for vIOMMUs to be asked of their properties at start
>>>> of day. I choose the least churn possible way for now (as opposed to a
>>>> treewide conversion) and allow easy conversion a posteriori. As
>>>> suggested by Peter Xu[7], I have ressurected Yi's patches[5][6] which
>>>> allows us to fetch PCI backing vIOMMU attributes, without necessarily
>>>> tieing the caller (VFIO or anyone else) to an IOMMU MR like I
>>>> was doing in v3.
>>>>
>>>> Patches 6-8: Handle configs with vIOMMU interrupt remapping but without
>>>> DMA translation allowed. Today the 'dma-translation' attribute is
>>>> x86-iommu only, but the way this series is structured nothing stops from
>>>> other vIOMMUs supporting it too as long as they use
>>>> pci_setup_iommu_ops() and the necessary IOMMU MR get_attr attributes
>>>> are handled. The blocker is thus relaxed when vIOMMUs are able to toggle
>>>> the toggle/report DMA_TRANSLATION attribute. With the patches up to this 
>>>> set,
>>>> we've then tackled item (1) of the second paragraph.
>>>>
>>>> Patches 9-15: Simplified a lot from v2 (patch 9) to only track the complete
>>>> IOVA address space, leveraging the logic we use to compose the dirty 
>>>> ranges.
>>>> The blocker is once again relaxed for vIOMMUs that advertise their IOVA
>>>> addressing limits. This tackles item (2). So far I mainly use it with
>>>> intel-iommu, although I have a small set of patches for virtio-iommu per
>>>> Alex's suggestion in v2.
>>>>
>>>> Comments, suggestions welcome. Thanks for the review!
>>>
>>>
>>> I spent sometime refreshing your series on upstream QEMU (See [1]) and
>>> gave migration a try with CX-7 VF. LGTM. It doesn't seem we are far
>>> from acceptance in QEMU 9.1. Are we ?
>>>
>> Yeah.
>>
>> There was a comment from Zhenzhong on the vfio_viommu_preset() here[0]. But I
>> was looking at that to remind myself what was it that we had to change, but 
>> even
>> with re-reading the thread I can't spot any flaw that needs change.
>>
>> [0]
>> https://lore.kernel.org/qemu-devel/de2b72d2-f56b-9350-ce0f-70edfb58e...@intel.com/#r
> 
> I introduced a vfio_devices_all_viommu_preset() routine to check all devices
> in a container and a simplified version of vfio_viommu_get_max_iova()
> returning the space max_iova.
>

/me nods

> 
>>> First, I will resend these with the changes I made :
>>>
>>>    vfio/common: Extract vIOMMU code from vfio_sync_dirty_bitmap()
>>>    vfio/common: Move dirty tracking ranges update to helper()
>>>
>>> I guess the PCIIOMMUOps::get_iommu_attr needs a close review. Is
>>> IOMMU_ATTR_DMA_TRANSLATION a must have ?
>>>
>> It's sort of the 'correct way' of relaxing vIOMMU checks, because you are 
>> 100%
>> guaranteed that the guest won't do DMA. The other outstanding thing related 
>> to
>> that is for older kernels which is to use the directmap for dirty page 
>> tracking,
>> but the moment a mapping is attempted the migration doesn't start or if it's 
>> in
>> progress it gets aborted[*]:
>>
>> https://lore.kernel.org/qemu-devel/20230908120521.50903-1-joao.m.mart...@oracle.com/
>>
>> The above link and DMA_TRANSLATION is mostly for the usecase we use that only
>> cares about vIOMMU for interrupt remapping only and no DMA translation 
>> services.
>> But we can't just disable dma-translation in qemu because it may crash older
>> kernels, so it supports both old and new this way.
>>
>> [*] Recently I noticed you improved error reporting, so
>> vfio_set_migration_error(-EOPNOTSUPP) probably has a better way of getting 
>> there.
> 
> Yes. So, I did a little more change to improve vfio_dirty_tracking_init().
> 
/me nods

>>> The rest is mostly VFIO internals for dirty tracking.
>>>
>> Right.
>>
>> I derailed with other work and also stuff required for iommu dirty tracking 
>> that
>> I forgot about these patches, sorry.
> 
> That's fine.
> 
> I am trying to sort out which patches could be merged in advance for QEMU 9.1
> and your series has shrunk a lot since it was last sent. I might resend the
> whole to cherry pick the simple changes and get some R-b tags.
> 

OK, sounds good.

> 
> For the record, here is my watch list:
> 
> QEMU series under review:
> 
> * [v7] Add a host IOMMU device abstraction to check with vIOMMU
>   https://lore.kernel.org/all/20240605083043.317831-1-zhenzhong.d...@intel.com
> 
>   Needs feedback on the PCI IOMMU ops. vIOMMU "iommufd" property ?

Part of the reason I suggested splitting it to allow other things to progress as
the IOMMU ops is related to the nesting.

>   Pushed on vfio-9.1 branch.
>   * [RFC v2] VIRTIO-IOMMU/VFIO: Fix host iommu geometry
>   https://lore.kernel.org/all/20240607143905.765133-1-eric.au...@redhat.com
>     Pushed on vfio-9.1 branch.
> 
> Need a resend :
> 
> * [v4] vfio: VFIO migration support with vIOMMU
>  
> https://lore.kernel.org/qemu-devel/20230622214845.3980-1-joao.m.mart...@oracle.com/
>     Refreshed the patchset on upstream and pushed on vfio-9.1 branch.

/me nods Probably deserves an item on the list too related to this subject of
vIOMMU and migration after the vIOMMU series is done:

*
https://lore.kernel.org/qemu-devel/20230908120521.50903-1-joao.m.mart...@oracle.com/

>   * [RFCv2] vfio/iommufd: IOMMUFD Dirty Tracking
>  
> https://lore.kernel.org/qemu-devel/20240212135643.5858-1-joao.m.mart...@oracle.com/
> 
I plan on still submitting a follow-up targetting 9.1 likely next week with
Avihai's comments on top of the vfio-9.1 branch after I sent some dirty tracking
fixes in kernel side. Though it is mostly to progress review as I think I am
still dependent on Zhenzhong prep series for merging because of this patch:
https://lore.kernel.org/all/20240605083043.317831-8-zhenzhong.d...@intel.com/

> * [v1] vfio: container: Fix missing allocation of VFIOSpaprContainer
>  
> https://lore.kernel.org/qemu-devel/171528203026.8420.10620440513237875837.st...@ltcd48-lp2.aus.stglabs.ibm.com/
>  
> Other interesting series (IOMMU related):
> 
> * [rfcv2] intel_iommu: Enable stage-1 translation for emulated device
>   
> https://lore.kernel.org/all/20240522062313.453317-1-zhenzhong.d...@intel.com/
> 
> * [PATCH ats_vtd v2 00/25] ATS support for VT-d
>  
> https://lore.kernel.org/all/20240515071057.33990-1-clement.mathieu--d...@eviden.com/
> 
> * [RFC v3] SMMUv3 nested translation support
>   
> https://lore.kernel.org/qemu-devel/20240429032403.74910-1-smost...@google.com/
> 
> * [PATCH RFCv1 00/14] Add Tegra241 (Grace) CMDQV Support (part 2/2)
>   https://lore.kernel.org/all/cover.1712978212.git.nicol...@nvidia.com/
> 
>   Yet to be sent,
>   https://github.com/nicolinc/qemu/commits/wip/iommufd_vcmdq/
> 
> * [RFC] Multifd device state transfer support with VFIO consumer
>   
> https://lore.kernel.org/all/cover.1713269378.git.maciej.szmigi...@oracle.com/
> 

I think Maciej (CC'ed) plans on submitting a -- simplified I think? -- v2 of
this one shortly still targetting 9.1. But it is largely migration core changes
with the last two patches on vfio.

        Joao

Reply via email to