On Fri, Sep 04, 2020 at 12:31:13PM +0300, Dima Stepanov wrote: > vhost-user devices can get a disconnect in the middle of the VHOST-USER > handshake on the migration start. If disconnect event happened right > before sending next VHOST-USER command, then the vhost_dev_set_log() > call in the vhost_migration_log() function will return error. This error > will lead to the assert() and close the QEMU migration source process. > For the vhost-user devices the disconnect event should not break the > migration process, because: > - the device will be in the stopped state, so it will not be changed > during migration > - if reconnect will be made the migration log will be reinitialized as > part of reconnect/init process: > #0 vhost_log_global_start (listener=0x563989cf7be0) > at hw/virtio/vhost.c:920 > #1 0x000056398603d8bc in listener_add_address_space > (listener=0x563989cf7be0, > as=0x563986ea4340 <address_space_memory>) > at softmmu/memory.c:2664 > #2 0x000056398603dd30 in memory_listener_register > (listener=0x563989cf7be0, > as=0x563986ea4340 <address_space_memory>) > at softmmu/memory.c:2740 > #3 0x0000563985fd6956 in vhost_dev_init (hdev=0x563989cf7bd8, > opaque=0x563989cf7e30, backend_type=VHOST_BACKEND_TYPE_USER, > busyloop_timeout=0) > at hw/virtio/vhost.c:1385 > #4 0x0000563985f7d0b8 in vhost_user_blk_connect (dev=0x563989cf7990) > at hw/block/vhost-user-blk.c:315 > #5 0x0000563985f7d3f6 in vhost_user_blk_event (opaque=0x563989cf7990, > event=CHR_EVENT_OPENED) > at hw/block/vhost-user-blk.c:379 > Update the vhost-user-blk device with the internal started_vu field which > will be used for initialization (vhost_user_blk_start) and clean up > (vhost_user_blk_stop). This additional flag in the VhostUserBlk structure > will be used to track whether the device really needs to be stopped and > cleaned up on a vhost-user level. > The disconnect event will set the overall VHOST device (not vhost-user) to > the stopped state, so it can be used by the general vhost_migration_log > routine. > Such approach could be propogated to the other vhost-user devices, but > better idea is just to make the same connect/disconnect code for all the > vhost-user devices. > > This migration issue was slightly discussed earlier: > - https://lists.gnu.org/archive/html/qemu-devel/2020-05/msg01509.html > - https://lists.gnu.org/archive/html/qemu-devel/2020-05/msg05241.html > > Signed-off-by: Dima Stepanov <dimas...@yandex-team.ru> > Reviewed-by: Raphael Norwitz <raphael.norw...@nutanix.com> > --- > hw/block/vhost-user-blk.c | 19 ++++++++++++++++--- > hw/virtio/vhost.c | 27 ++++++++++++++++++++++++--- > include/hw/virtio/vhost-user-blk.h | 10 ++++++++++ > 3 files changed, 50 insertions(+), 6 deletions(-) > > diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c > index 39aec42..a076b1e 100644 > --- a/hw/block/vhost-user-blk.c > +++ b/hw/block/vhost-user-blk.c > @@ -150,6 +150,7 @@ static int vhost_user_blk_start(VirtIODevice *vdev) > error_report("Error starting vhost: %d", -ret); > goto err_guest_notifiers; > } > + s->started_vu = true; > > /* guest_notifier_mask/pending not used yet, so just unmask > * everything here. virtio-pci will do the right thing by > @@ -175,6 +176,11 @@ static void vhost_user_blk_stop(VirtIODevice *vdev) > VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus); > int ret; > > + if (!s->started_vu) { > + return; > + } > + s->started_vu = false; > + > if (!k->set_guest_notifiers) { > return; > } > @@ -341,9 +347,7 @@ static void vhost_user_blk_disconnect(DeviceState *dev) > } > s->connected = false; > > - if (s->dev.started) { > - vhost_user_blk_stop(vdev); > - } > + vhost_user_blk_stop(vdev); > > vhost_dev_cleanup(&s->dev); > } > @@ -399,6 +403,15 @@ static void vhost_user_blk_event(void *opaque, > QEMUChrEvent event) > NULL, NULL, false); > aio_bh_schedule_oneshot(ctx, vhost_user_blk_chr_closed_bh, > opaque); > } > + > + /* > + * Move vhost device to the stopped state. The vhost-user device > + * will be clean up and disconnected in BH. This can be useful in > + * the vhost migration code. If disconnect was caught there is an > + * option for the general vhost code to get the dev state without > + * knowing its type (in this case vhost-user). > + */ > + s->dev.started = false; > break; > case CHR_EVENT_BREAK: > case CHR_EVENT_MUX_IN:
Hi Dima, Is it possible to move this logic into the vhost_*() API so that all devices benefit from it? This seems like a generic vhost-user issue rather than a vhost-user-blk issue. In other words, it would be great if the vhost APIs in QEMU are designed in a way so that the user doesn't need to think about this. Stefan
signature.asc
Description: PGP signature