On 22/07/2024 18:04, Cédric Le Goater wrote: > On 7/22/24 18:29, Joao Martins wrote: >> On 22/07/2024 16:58, Cédric Le Goater wrote: >>> On 7/22/24 17:42, Joao Martins wrote: >>>> On 22/07/2024 16:13, Cédric Le Goater wrote: >>>>> On 7/22/24 17:01, Joao Martins wrote: >>>>>> On 22/07/2024 15:53, Cédric Le Goater wrote: >>>>>>> On 7/19/24 19:26, Joao Martins wrote: >>>>>>>> On 19/07/2024 15:24, Joao Martins wrote: >>>>>>>>> On 19/07/2024 15:17, Cédric Le Goater wrote: >>>>>>>>>> On 7/19/24 14:05, Joao Martins wrote: >>>>>>>>>>> By default VFIO migration is set to auto, which will support live >>>>>>>>>>> migration if the migration capability is set *and* also dirty page >>>>>>>>>>> tracking is supported. >>>>>>>>>>> >>>>>>>>>>> For testing purposes one can force enable without dirty page >>>>>>>>>>> tracking >>>>>>>>>>> via enable-migration=on, but that option is generally left for >>>>>>>>>>> testing >>>>>>>>>>> purposes. >>>>>>>>>>> >>>>>>>>>>> So starting with IOMMU dirty tracking it can use to accomodate the >>>>>>>>>>> lack of >>>>>>>>>>> VF dirty page tracking allowing us to minimize the VF requirements >>>>>>>>>>> for >>>>>>>>>>> migration and thus enabling migration by default for those too. >>>>>>>>>>> >>>>>>>>>>> While at it change the error messages to mention IOMMU dirty >>>>>>>>>>> tracking as >>>>>>>>>>> well. >>>>>>>>>>> >>>>>>>>>>> Signed-off-by: Joao Martins <joao.m.mart...@oracle.com> >>>>>>>>>>> --- >>>>>>>>>>> include/hw/vfio/vfio-common.h | 1 + >>>>>>>>>>> hw/vfio/iommufd.c | 2 +- >>>>>>>>>>> hw/vfio/migration.c | 11 ++++++----- >>>>>>>>>>> 3 files changed, 8 insertions(+), 6 deletions(-) >>>>>>>>>>> >>>>>>>>>>> diff --git a/include/hw/vfio/vfio-common.h >>>>>>>>>>> b/include/hw/vfio/vfio-common.h >>>>>>>>>>> index 7e530c7869dc..00b9e933449e 100644 >>>>>>>>>>> --- a/include/hw/vfio/vfio-common.h >>>>>>>>>>> +++ b/include/hw/vfio/vfio-common.h >>>>>>>>>>> @@ -299,6 +299,7 @@ int vfio_devices_query_dirty_bitmap(const >>>>>>>>>>> VFIOContainerBase *bcontainer, >>>>>>>>>>> VFIOBitmap *vbmap, hwaddr iova, hwaddr size, >>>>>>>>>>> Error >>>>>>>>>>> **errp); >>>>>>>>>>> int vfio_get_dirty_bitmap(const VFIOContainerBase *bcontainer, >>>>>>>>>>> uint64_t >>>>>>>>>>> iova, >>>>>>>>>>> uint64_t size, ram_addr_t ram_addr, >>>>>>>>>>> Error >>>>>>>>>>> **errp); >>>>>>>>>>> +bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt); >>>>>>>>>>> /* Returns 0 on success, or a negative errno. */ >>>>>>>>>>> bool vfio_device_get_name(VFIODevice *vbasedev, Error **errp); >>>>>>>>>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c >>>>>>>>>>> index 7dd5d43ce06a..a998e8578552 100644 >>>>>>>>>>> --- a/hw/vfio/iommufd.c >>>>>>>>>>> +++ b/hw/vfio/iommufd.c >>>>>>>>>>> @@ -111,7 +111,7 @@ static void >>>>>>>>>>> iommufd_cdev_unbind_and_disconnect(VFIODevice >>>>>>>>>>> *vbasedev) >>>>>>>>>>> iommufd_backend_disconnect(vbasedev->iommufd); >>>>>>>>>>> } >>>>>>>>>>> -static bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt) >>>>>>>>>>> +bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt) >>>>>>>>>>> { >>>>>>>>>>> return hwpt && hwpt->hwpt_flags & >>>>>>>>>>> IOMMU_HWPT_ALLOC_DIRTY_TRACKING; >>>>>>>>>>> } >>>>>>>>>>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c >>>>>>>>>>> index 34d4be2ce1b1..63ffa46c9652 100644 >>>>>>>>>>> --- a/hw/vfio/migration.c >>>>>>>>>>> +++ b/hw/vfio/migration.c >>>>>>>>>>> @@ -1036,16 +1036,17 @@ bool vfio_migration_realize(VFIODevice >>>>>>>>>>> *vbasedev, >>>>>>>>>>> Error **errp) >>>>>>>>>>> return !vfio_block_migration(vbasedev, err, errp); >>>>>>>>>>> } >>>>>>>>>>> - if (!vbasedev->dirty_pages_supported) { >>>>>>>>>>> + if (!vbasedev->dirty_pages_supported && >>>>>>>>>>> + !iommufd_hwpt_dirty_tracking(vbasedev->hwpt)) { >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Some platforms do not have IOMMUFD support and this call will need >>>>>>>>>> some kind of abstract wrapper to reflect dirty tracking support in >>>>>>>>>> the IOMMU backend. >>>>>>>>>> >>>>>>>>> >>>>>>>>> This was actually on purpose because only IOMMUFD presents a view of >>>>>>>>> hardware >>>>>>>>> whereas type1 supporting dirty page tracking is not used as means to >>>>>>>>> 'migration >>>>>>>>> is supported'. >>>>>>>>> >>>>>>>>> The hwpt is nil in type1 and the helper checks that, so it should >>>>>>>>> return >>>>>>>>> false. >>>>>>>>> >>>>>>>> >>>>>>>> Oh wait, maybe you're talking about CONFIG_IOMMUFD=n which I totally >>>>>>>> didn't >>>>>>>> consider. Maybe this would be a elegant way to address it? Looks to >>>>>>>> pass my >>>>>>>> build with CONFIG_IOMMUFD=n >>>>>>>> >>>>>>>> diff --git a/include/hw/vfio/vfio-common.h >>>>>>>> b/include/hw/vfio/vfio-common.h >>>>>>>> index 61dd48e79b71..422ad4a5bdd1 100644 >>>>>>>> --- a/include/hw/vfio/vfio-common.h >>>>>>>> +++ b/include/hw/vfio/vfio-common.h >>>>>>>> @@ -300,7 +300,14 @@ int vfio_devices_query_dirty_bitmap(const >>>>>>>> VFIOContainerBase >>>>>>>> *bcontainer, >>>>>>>> VFIOBitmap *vbmap, hwaddr iova, hwaddr size, Error >>>>>>>> **errp); >>>>>>>> int vfio_get_dirty_bitmap(const VFIOContainerBase *bcontainer, >>>>>>>> uint64_t >>>>>>>> iova, >>>>>>>> uint64_t size, ram_addr_t ram_addr, >>>>>>>> Error >>>>>>>> **errp); >>>>>>>> +#ifdef CONFIG_IOMMUFD >>>>>>>> bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt); >>>>>>>> +#else >>>>>>>> +static inline bool iommufd_hwpt_dirty_tracking(VFIOIOASHwpt *hwpt) >>>>>>>> +{ >>>>>>>> + return false; >>>>>>>> +} >>>>>>>> +#endif >>>>>>>> >>>>>>>> /* Returns 0 on success, or a negative errno. */ >>>>>>>> bool vfio_device_get_name(VFIODevice *vbasedev, Error **errp); >>>>>>>> >>>>>>> >>>>>>> hmm, no. You will need to introduce a new Host IOMMU device capability, >>>>>>> something like : >>>>>>> >>>>>>> HOST_IOMMU_DEVICE_CAP_DIRTY_TRACKING, >>>>>>> >>>>>>> Then, introduce an helper routine to check the capability : >>>>>>> >>>>>>> return hiodc->get_cap( ... >>>>>>> HOST_IOMMU_DEVICE_CAP_DIRTY_TRACKING...) >>>>>>> and replace the iommufd_hwpt_dirty_tracking call with it. >>>>>>> >>>>>>> Yeah I know, it's cumbersome but it's cleaner ! >>>>>>> >>>>>> >>>>>> Funny you mention it, because that's what I did in v3: >>>>>> >>>>>> https://lore.kernel.org/qemu-devel/20240708143420.16953-9-joao.m.mart...@oracle.com/ >>>>>> >>>>>> But it was suggested to drop (I am assuming to avoid complexity) >>>>> >>>>> my bad if I did :/ >>>>> >>>> >>>> No worries it is all part of review -- I think Zhenzhong proposed with good >>>> intentions, and I probably didn't think too hard about the consequences on >>>> layering with the HIOD. >>>> >>>>> we will need an helper such as : >>>>> >>>>> bool vfio_device_dirty_tracking(VFIODevice *vbasedev) >>>>> { >>>>> HostIOMMUDevice *hiod = vbasedev->hiod ; >>>>> HostIOMMUDeviceClass *hiodc = HOST_IOMMU_DEVICE_GET_CLASS(hiod); >>>>> >>>>> return hiodc->get_cap && >>>>> hiodc->get_cap(hiod, HOST_IOMMU_DEVICE_CAP_DIRTY_TRACKING, >>>>> NULL) >>>>> == 1; >>>>> } >>>>> >>>>> and something like, >>>>> >>>>> static int hiod_iommufd_vfio_get_cap(HostIOMMUDevice *hiod, int cap, >>>>> Error **errp) >>>>> { >>>>> switch (cap) { >>>>> case HOST_IOMMU_DEVICE_CAP_DIRTY_TRACKING: >>>>> return !!(hiod->caps.hw_caps & IOMMU_HW_CAP_DIRTY_TRACKING); >>>>> default: >>>>> error_setg(errp, "%s: unsupported capability %x", hiod->name, >>>>> cap); >>>>> return -EINVAL; >>>>> } >>>>> } >>>>> >>>>> Feel free to propose your own implementation, >>>>> >>>> >>>> Actually it's close to what I had in v3 link, except the new helper (the >>>> name >>>> vfio_device_dirty_tracking is a bit misleading I would call it >>>> vfio_device_iommu_dirty_tracking) >>> >>> Let's call it vfio_device_iommu_dirty_tracking. >>> >> >> I thinking about this and I am not that sure it makes sense. That is the >> .get_cap() stuff. >> >> Using the hw_caps is only useful when choosing hwpt_flags, then the only >> thing >> that matters for patch 12 is after the device is attached ... hence we gotta >> look at hwpt_flags. That ultimately is what tells if dirty tracking can be >> done >> in the device pagetable. >> >> I can expand hiod_iommufd_vfio_get_cap() to return the hwpt flags, but it >> feels >> just as hacky given that I am testing its enablement of the hardware >> pagetable >> (HWPT), and not asking a HIOD capability. > > arf. yes. > >> e.g. hiod_iommufd_vfio_get_cap would make more sense in patch 9 for the >> attach_device() flow[*], but not for vfio_migration_realize() flow. >> >> [*] though feels unneeded as we only have a local callsite, not external >> user so >> far. >> >> Which would technically make v5.1 patch a more correct right check, perhaps >> with >> better layering/naming. > > The quick fix (plan B if needed) would be : > > @@ -1038,8 +1038,11 @@ bool vfio_migration_realize(VFIODevice * > } > > if ((!vbasedev->dirty_pages_supported || > - vbasedev->device_dirty_page_tracking == ON_OFF_AUTO_OFF) && > - !iommufd_hwpt_dirty_tracking(vbasedev->hwpt)) { > + vbasedev->device_dirty_page_tracking == ON_OFF_AUTO_OFF) > +#ifdef CONFIG_IOMMUFD > + && !iommufd_hwpt_dirty_tracking(vbasedev->hwpt) > +#endif > + ) { > if (vbasedev->enable_migration == ON_OFF_AUTO_AUTO) { > error_setg(&err, > "%s: VFIO device doesn't support device and " > > I would prefer to avoid the common component to reference IOMMUFD > directly. The only exception today is the use of the vbasedev->iommufd > pointer which is treated as opaque. > > I guess a simple approach would be to store the result of > iommufd_hwpt_dirty_tracking(hwpt) under a 'dirty_tracking' attribute > of vbasedev and return the value in vfio_device_iommu_dirty_tracking() ? > > if not, let's merge v5 (with more acks) and the fix of plan B. > > >>>> I can follow-up with this improvement in case this gets merged as is, >>> >>> I can't merge as is since it break compiles (I am excluding the v5.1 patch). >>> Which means I would prefer a v6 please. >>> >> >> Ah OK -- I thought this discussion assumed v5.1 to be in which does fix the >> compilation issue and all that remained were acks. > > v5.1 proposes a CONFIG_IOMMUFD in a header file which is error prone. >
hmmm, ok, that's strage. It does look quite common in Qemu? e.g. We even have CONFIG_LINUX in the vfio-common.h header file. >>>> or include >>>> it in the next version if you prefer to adjourn this series into 9.2 >>>> (given the >>>> lack of time to get everything right). >>> >>> There aren't many open questions left. >>> >>> * PATCH 5 lacks a R-b. I would feel more confortable if ZhenZhong or >>> Eric acked the changes. >>> * PATCH 9 is slightly hacky with the use of vfio_device_get_aw_bits(). >>> I think it's minor. I would also feel more confortable if ZhenZhong >>> acked the changes. >> >> I guess you meant patch 6 and not 9. > > yes. > > Thanks, > > C. > > > >> >>> * PATCH 12 needs the fix we have been talking about. >>> * PATCH 13 is for dev/debug. >>> >>> >>> What's important is to avoid introducing regressions in the current >>> behavior, >>> that is when not using IOMMUFD. It looks fine on that aspect AFAICT. >> >> OK >> >