On 22/07/2024 16:58, Cédric Le Goater wrote: > On 7/22/24 17:42, Joao Martins wrote: >> On 22/07/2024 16:13, Cédric Le Goater wrote: >>> On 7/22/24 17:01, Joao Martins wrote: >>>> On 22/07/2024 15:53, Cédric Le Goater wrote: >>>>> On 7/19/24 19:26, Joao Martins wrote: >>>>>> On 19/07/2024 15:24, Joao Martins wrote: >>>>>>> On 19/07/2024 15:17, Cédric Le Goater wrote: >>>>>>>> On 7/19/24 14:05, Joao Martins wrote: >>>>>>>>> By default VFIO migration is set to auto, which will support live >>>>>>>>> migration if the migration capability is set *and* also dirty page >>>>>>>>> tracking is supported. >>>>>>>>> >>>>>>>>> For testing purposes one can force enable without dirty page tracking >>>>>>>>> via enable-migration=on, but that option is generally left for testing >>>>>>>>> purposes. >>>>>>>>> >>>>>>>>> So starting with IOMMU dirty tracking it can use to accomodate the >>>>>>>>> lack of >>>>>>>>> VF dirty page tracking allowing us to minimize the VF requirements for >>>>>>>>> migration and thus enabling migration by default for those too. >>>>>>>>> >>>>>>>>> While at it change the error messages to mention IOMMU dirty tracking >>>>>>>>> as >>>>>>>>> well. >>>>>>>>> >>>>>>>>> Signed-off-by: Joao Martins <joao.m.mart...@oracle.com> >>>>>>>>> --- >>>>>>>>> include/hw/vfio/vfio-common.h | 1 + >>>>>>>>> hw/vfio/iommufd.c | 2 +- >>>>>>>>> hw/vfio/migration.c | 11 ++++++----- >>>>>>>>> 3 files changed, 8 insertions(+), 6 deletions(-) >>>>>>>>> >>>>>>>>> diff --git a/include/hw/vfio/vfio-common.h >>>>>>>>> b/include/hw/vfio/vfio-common.h >>>>>>>>> index 7e530c7869dc..00b9e933449e 100644 >>>>>>>>> --- a/include/hw/vfio/vfio-common.h >>>>>>>>> +++ b/include/hw/vfio/vfio-common.h >>>>>>>>> @@ -299,6 +299,7 @@ int vfio_devices_query_dirty_bitmap(const >>>>>>>>> VFIOContainerBase *bcontainer, >>>>>>>>> VFIOBitmap *vbmap, hwaddr iova, hwaddr size, >>>>>>>>> Error >>>>>>>>> **errp); >>>>>>>>> int vfio_get_dirty_bitmap(const VFIOContainerBase *bcontainer, >>>>>>>>> uint64_t >>>>>>>>> iova, >>>>>>>>> uint64_t size, ram_addr_t ram_addr, >>>>>>>>> Error >>>>>>>>> **errp); >>>>>>>>> +bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt); >>>>>>>>> /* Returns 0 on success, or a negative errno. */ >>>>>>>>> bool vfio_device_get_name(VFIODevice *vbasedev, Error **errp); >>>>>>>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c >>>>>>>>> index 7dd5d43ce06a..a998e8578552 100644 >>>>>>>>> --- a/hw/vfio/iommufd.c >>>>>>>>> +++ b/hw/vfio/iommufd.c >>>>>>>>> @@ -111,7 +111,7 @@ static void >>>>>>>>> iommufd_cdev_unbind_and_disconnect(VFIODevice >>>>>>>>> *vbasedev) >>>>>>>>> iommufd_backend_disconnect(vbasedev->iommufd); >>>>>>>>> } >>>>>>>>> -static bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt) >>>>>>>>> +bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt) >>>>>>>>> { >>>>>>>>> return hwpt && hwpt->hwpt_flags & >>>>>>>>> IOMMU_HWPT_ALLOC_DIRTY_TRACKING; >>>>>>>>> } >>>>>>>>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c >>>>>>>>> index 34d4be2ce1b1..63ffa46c9652 100644 >>>>>>>>> --- a/hw/vfio/migration.c >>>>>>>>> +++ b/hw/vfio/migration.c >>>>>>>>> @@ -1036,16 +1036,17 @@ bool vfio_migration_realize(VFIODevice >>>>>>>>> *vbasedev, >>>>>>>>> Error **errp) >>>>>>>>> return !vfio_block_migration(vbasedev, err, errp); >>>>>>>>> } >>>>>>>>> - if (!vbasedev->dirty_pages_supported) { >>>>>>>>> + if (!vbasedev->dirty_pages_supported && >>>>>>>>> + !iommufd_hwpt_dirty_tracking(vbasedev->hwpt)) { >>>>>>>> >>>>>>>> >>>>>>>> Some platforms do not have IOMMUFD support and this call will need >>>>>>>> some kind of abstract wrapper to reflect dirty tracking support in >>>>>>>> the IOMMU backend. >>>>>>>> >>>>>>> >>>>>>> This was actually on purpose because only IOMMUFD presents a view of >>>>>>> hardware >>>>>>> whereas type1 supporting dirty page tracking is not used as means to >>>>>>> 'migration >>>>>>> is supported'. >>>>>>> >>>>>>> The hwpt is nil in type1 and the helper checks that, so it should return >>>>>>> false. >>>>>>> >>>>>> >>>>>> Oh wait, maybe you're talking about CONFIG_IOMMUFD=n which I totally >>>>>> didn't >>>>>> consider. Maybe this would be a elegant way to address it? Looks to pass >>>>>> my >>>>>> build with CONFIG_IOMMUFD=n >>>>>> >>>>>> diff --git a/include/hw/vfio/vfio-common.h >>>>>> b/include/hw/vfio/vfio-common.h >>>>>> index 61dd48e79b71..422ad4a5bdd1 100644 >>>>>> --- a/include/hw/vfio/vfio-common.h >>>>>> +++ b/include/hw/vfio/vfio-common.h >>>>>> @@ -300,7 +300,14 @@ int vfio_devices_query_dirty_bitmap(const >>>>>> VFIOContainerBase >>>>>> *bcontainer, >>>>>> VFIOBitmap *vbmap, hwaddr iova, hwaddr size, Error >>>>>> **errp); >>>>>> int vfio_get_dirty_bitmap(const VFIOContainerBase *bcontainer, >>>>>> uint64_t >>>>>> iova, >>>>>> uint64_t size, ram_addr_t ram_addr, Error >>>>>> **errp); >>>>>> +#ifdef CONFIG_IOMMUFD >>>>>> bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt); >>>>>> +#else >>>>>> +static inline bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt) >>>>>> +{ >>>>>> + return false; >>>>>> +} >>>>>> +#endif >>>>>> >>>>>> /* Returns 0 on success, or a negative errno. */ >>>>>> bool vfio_device_get_name(VFIODevice *vbasedev, Error **errp); >>>>>> >>>>> >>>>> hmm, no. You will need to introduce a new Host IOMMU device capability, >>>>> something like : >>>>> >>>>> HOST_IOMMU_DEVICE_CAP_DIRTY_TRACKING, >>>>> >>>>> Then, introduce an helper routine to check the capability : >>>>> >>>>> return hiodc->get_cap( ... HOST_IOMMU_DEVICE_CAP_DIRTY_TRACKING...) >>>>> and replace the iommufd_hwpt_dirty_tracking call with it. >>>>> >>>>> Yeah I know, it's cumbersome but it's cleaner ! >>>>> >>>> >>>> Funny you mention it, because that's what I did in v3: >>>> >>>> https://lore.kernel.org/qemu-devel/20240708143420.16953-9-joao.m.mart...@oracle.com/ >>>> >>>> But it was suggested to drop (I am assuming to avoid complexity) >>> >>> my bad if I did :/ >>> >> >> No worries it is all part of review -- I think Zhenzhong proposed with good >> intentions, and I probably didn't think too hard about the consequences on >> layering with the HIOD. >> >>> we will need an helper such as : >>> >>> bool vfio_device_dirty_tracking(VFIODevice *vbasedev) >>> { >>> HostIOMMUDevice *hiod = vbasedev->hiod ; >>> HostIOMMUDeviceClass *hiodc = HOST_IOMMU_DEVICE_GET_CLASS(hiod); >>> >>> return hiodc->get_cap && >>> hiodc->get_cap(hiod, HOST_IOMMU_DEVICE_CAP_DIRTY_TRACKING, NULL) >>> == 1; >>> } >>> >>> and something like, >>> >>> static int hiod_iommufd_vfio_get_cap(HostIOMMUDevice *hiod, int cap, >>> Error **errp) >>> { >>> switch (cap) { >>> case HOST_IOMMU_DEVICE_CAP_DIRTY_TRACKING: >>> return !!(hiod->caps.hw_caps & IOMMU_HW_CAP_DIRTY_TRACKING); >>> default: >>> error_setg(errp, "%s: unsupported capability %x", hiod->name, >>> cap); >>> return -EINVAL; >>> } >>> } >>> >>> Feel free to propose your own implementation, >>> >> >> Actually it's close to what I had in v3 link, except the new helper (the name >> vfio_device_dirty_tracking is a bit misleading I would call it >> vfio_device_iommu_dirty_tracking) > > Let's call it vfio_device_iommu_dirty_tracking. >
I thinking about this and I am not that sure it makes sense. That is the .get_cap() stuff. Using the hw_caps is only useful when choosing hwpt_flags, then the only thing that matters for patch 12 is after the device is attached ... hence we gotta look at hwpt_flags. That ultimately is what tells if dirty tracking can be done in the device pagetable. I can expand hiod_iommufd_vfio_get_cap() to return the hwpt flags, but it feels just as hacky given that I am testing its enablement of the hardware pagetable (HWPT), and not asking a HIOD capability. e.g. hiod_iommufd_vfio_get_cap would make more sense in patch 9 for the attach_device() flow[*], but not for vfio_migration_realize() flow. [*] though feels unneeded as we only have a local callsite, not external user so far. Which would technically make v5.1 patch a more correct right check, perhaps with better layering/naming. >> I can follow-up with this improvement in case this gets merged as is, > > I can't merge as is since it break compiles (I am excluding the v5.1 patch). > Which means I would prefer a v6 please. > Ah OK -- I thought this discussion assumed v5.1 to be in which does fix the compilation issue and all that remained were acks. >> or include >> it in the next version if you prefer to adjourn this series into 9.2 (given >> the >> lack of time to get everything right). > > There aren't many open questions left. > > * PATCH 5 lacks a R-b. I would feel more confortable if ZhenZhong or > Eric acked the changes. > * PATCH 9 is slightly hacky with the use of vfio_device_get_aw_bits(). > I think it's minor. I would also feel more confortable if ZhenZhong > acked the changes. I guess you meant patch 6 and not 9. > * PATCH 12 needs the fix we have been talking about. > * PATCH 13 is for dev/debug. > > > What's important is to avoid introducing regressions in the current behavior, > that is when not using IOMMUFD. It looks fine on that aspect AFAICT. OK