>-----Original Message----- >From: Joao Martins <joao.m.mart...@oracle.com> >Sent: Monday, June 26, 2023 6:08 PM >To: Duan, Zhenzhong <zhenzhong.d...@intel.com> >Cc: alex.william...@redhat.com; c...@redhat.com; qemu-devel@nongnu.org; >avih...@nvidia.com; Peng, Chao P <chao.p.p...@intel.com> >Subject: Re: [PATCH v3 1/3] vfio/pci: Fix resource leak in vfio_realize > >On 26/06/2023 08:02, Duan, Zhenzhong wrote: >>> -----Original Message----- >>> From: Duan, Zhenzhong >>> Sent: Sunday, June 25, 2023 2:01 PM >>> To: Joao Martins <joao.m.mart...@oracle.com> >>> Cc: alex.william...@redhat.com; c...@redhat.com; >>> qemu-devel@nongnu.org; avih...@nvidia.com; Peng, Chao P >>> <chao.p.p...@intel.com> >>> Subject: RE: [PATCH v3 1/3] vfio/pci: Fix resource leak in >>> vfio_realize >>> >>>> -----Original Message----- >>>> From: Joao Martins <joao.m.mart...@oracle.com> >>>> Sent: Wednesday, June 21, 2023 7:08 PM >>>> To: Duan, Zhenzhong <zhenzhong.d...@intel.com> >>>> Cc: alex.william...@redhat.com; c...@redhat.com; qemu- >>> de...@nongnu.org; >>>> avih...@nvidia.com; Peng, Chao P <chao.p.p...@intel.com> >>>> Subject: Re: [PATCH v3 1/3] vfio/pci: Fix resource leak in >>>> vfio_realize >>>> >>>> >>>> >>>> On 21/06/2023 09:02, Zhenzhong Duan wrote: >>>>> When adding migration blocker failed in vfio_migration_realize(), >>>>> hotplug will fail and we see below: >>>>> >>>>> (qemu) device_add >>>>> vfio-pci,host=81:11.1,id=vfio1,bus=root1,x-enable-migration=true >>>>> 0000:81:11.1: VFIO migration is not supported in kernel >>>>> Error: disallowing migration blocker (--only-migratable) for: VFIO >>>>> device doesn't support migration >>>>> >>>>> If we hotplug again we should see same log as above, but we see: >>>>> (qemu) device_add >>>>> vfio-pci,host=81:11.0,id=vfio0,bus=root0,x-enable-migration=true >>>>> Error: vfio 0000:81:11.0: device is already attached >>>>> >>>>> That's because some references to VFIO device isn't released, we >>>>> should check return value of vfio_migration_realize() and release >>>>> the references, then VFIO device will be truely released when >>>>> hotplug failed. >>>>> >>>>> Signed-off-by: Zhenzhong Duan <zhenzhong.d...@intel.com> >>>>> --- >>>>> hw/vfio/pci.c | 1 + >>>>> 1 file changed, 1 insertion(+) >>>>> >>>>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index >>>>> 73874a94de12..c71b0955d81c 100644 >>>>> --- a/hw/vfio/pci.c >>>>> +++ b/hw/vfio/pci.c >>>>> @@ -3210,6 +3210,7 @@ static void vfio_realize(PCIDevice *pdev, >>>>> Error >>>> **errp) >>>>> ret = vfio_migration_realize(vbasedev, errp); >>>>> if (ret) { >>>>> error_report("%s: Migration disabled", >>>>> vbasedev->name); >>>>> + goto out_deregister; >>>>> } >>>>> } >>>>> >>>> This doesn't look right. This means that failure to support >>>> migration will deregister the device. >>> >>> In my understanding, failure to support migration but success to add >>> migration blocker will not deregister device. >>> vfio_migration_realize() is successful in this case. >>> But failure to add migration blocker should deregister device, >>> because >>> vfio_exitfn() is never called in this case(errp set), jumping to >>> out_deregister is the best choice. Then vfio_migration_realize() >>> should return failure in this case. >> > >I was checking other devices in the tree, and I think you are right. Failure to >add the migration blocker results in a failure of device realize. Which IIUC >only >happens in 'only-migratable' setups and during snapshots. Yes.
> >Maybe also including a sentence explainer in the commit message is useful >too. Sure, will do. > >> I just realized the error path in vfio_realize() isn't adequate. We need to >> add >more deregister code just as vfio_exitfn(), see below. I'll re-post after we >are >consensus on this and get some comments of PATCH3. >> >> --- a/hw/vfio/pci.c >> +++ b/hw/vfio/pci.c >> @@ -3210,7 +3210,7 @@ static void vfio_realize(PCIDevice *pdev, Error >**errp) >> ret = vfio_migration_realize(vbasedev, errp); >> if (ret) { >> error_report("%s: Migration disabled", vbasedev->name); >> - goto out_deregister; >> + goto out_vfio_migration; >> } >> } >> >> @@ -3220,11 +3220,17 @@ static void vfio_realize(PCIDevice *pdev, >> Error **errp) >> >> return; >> >> +out_vfio_migration: >> + vfio_migration_exit(vbasedev); > >This belongs in this patch from the looks Yes, I plan to merge above change to this patch. > >> out_deregister: >> pci_device_set_intx_routing_notifier(&vdev->pdev, NULL); >> if (vdev->irqchip_change_notifier.notify) { >> kvm_irqchip_remove_change_notifier(&vdev->irqchip_change_notifier); >> } >> + vfio_disable_interrupts(vdev); >> + if (vdev->intx.mmap_timer) { >> + timer_free(vdev->intx.mmap_timer); >> + } > >But this one suggests another one as it looks a pre-existing issue? Yes, it's another resource leak I just found. Not sure if it's better to also merge above change to this patch which is targeting resource leak issues, or to PATCH2 which is targeting out_deregister path, or to create a new one. Any suggestion? Thanks Zhenzhong