On 29/06/2023 09:40, Zhenzhong Duan wrote: > When vfio_realize() succeeds, hot unplug will call vfio_exitfn() > to free resources allocated in vfio_realize(); when vfio_realize() > fails, vfio_exitfn() is never called and we need to free resources > in vfio_realize(). > > In the case that vfio_migration_realize() fails, > e.g: with -only-migratable & enable-migration=off, we see below: > > (qemu) device_add > vfio-pci,host=81:11.1,id=vfio1,bus=root1,enable-migration=off > 0000:81:11.1: Migration disabled > Error: disallowing migration blocker (--only-migratable) for: 0000:81:11.1: > Migration is disabled for VFIO device > > If we hotplug again we should see same log as above, but we see: > (qemu) device_add > vfio-pci,host=81:11.1,id=vfio1,bus=root1,enable-migration=off > Error: vfio 0000:81:11.1: device is already attached > > That's because some references to VFIO device isn't released, > we should check return value of vfio_migration_realize() and > release the references, then VFIO device will be truely > released when hotplug fails. > > Fixes: a22651053b59 ("vfio: Make vfio-pci device migration capable") > Signed-off-by: Zhenzhong Duan <zhenzhong.d...@intel.com> > --- > hw/vfio/pci.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c > index 54a8179d1c64..dc69d3031b24 100644 > --- a/hw/vfio/pci.c > +++ b/hw/vfio/pci.c > @@ -3210,6 +3210,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp) > ret = vfio_migration_realize(vbasedev, errp); > if (ret) { > error_report("%s: Migration disabled", vbasedev->name); > + goto out_vfio_migration; > } > } > > @@ -3219,6 +3220,8 @@ static void vfio_realize(PCIDevice *pdev, Error **errp) > > return; > > +out_vfio_migration: > + vfio_migration_exit(vbasedev); > out_deregister: > vfio_disable_interrupts(vdev); > out_intx_disable:
I agree with the general sentiment behind the change. Clearly vfio::migration and vfio::migration_blocker are leaking from inside the migration_realize() function. But it is kinda awkward semantic that vfio_migration_realize() (or any realize) failures need to be accompanied with a vfio_migration_exit() that tears down state *leaked* by its realize() failure. It sounds to me that this should be inside the vfio_migration_realize() not on the caller? Unless QEMU ::realize() is expected to do this.