> 2023年8月14日 下午8:11,Raphael Norwitz <raphael.norw...@nutanix.com> 写道: > > Why can’t we rather fix this by adding a “event_cb” param to > vhost_user_async_close and then call qemu_chr_fe_set_handlers in > vhost_user_async_close_bh()? > > Even if calling vhost_dev_cleanup() twice is safe today I worry future > changes may easily stumble over the reconnect case and introduce crashes or > double frees. > I think add a new event_cb is not good enough. ‘qemu_chr_fe_set_handlers’ has been called in vhost_user_async_close, and will be called in event->cb, so why need add a new event_cb? For avoiding to call the vhost_dev_cleanup() twice, add a ‘inited’ in struct vhost-dev to mark if it’s inited like this:
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c index e2f6ffb446..edc80c0231 100644 --- a/hw/virtio/vhost.c +++ b/hw/virtio/vhost.c @@ -1502,6 +1502,7 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque, goto fail_busyloop; } + hdev->inited = true; return 0; fail_busyloop: @@ -1520,6 +1521,10 @@ void vhost_dev_cleanup(struct vhost_dev *hdev) { int i; + if (!hdev->inited) { + return; + } + hdev->inited = false; trace_vhost_dev_cleanup(hdev); for (i = 0; i < hdev->nvqs; ++i) { diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h index ca3131b1af..74b1aec960 100644 --- a/include/hw/virtio/vhost.h +++ b/include/hw/virtio/vhost.h @@ -123,6 +123,7 @@ struct vhost_dev { /* @started: is the vhost device started? */ bool started; bool log_enabled; + bool inited; uint64_t log_size; Error *migration_blocker; const VhostOps *vhost_ops; Thanks. > >> On Aug 4, 2023, at 1:29 AM, Li Feng <fen...@smartx.com> wrote: >> >> When the vhost-user is reconnecting to the backend, and if the vhost-user >> fails >> at the get_features in vhost_dev_init(), then the reconnect will fail >> and it will not be retriggered forever. >> >> The reason is: >> When the vhost-user fail at get_features, the vhost_dev_cleanup will be >> called >> immediately. >> >> vhost_dev_cleanup calls 'memset(hdev, 0, sizeof(struct vhost_dev))'. >> >> The reconnect path is: >> vhost_user_blk_event >> vhost_user_async_close(.. vhost_user_blk_disconnect ..) >> qemu_chr_fe_set_handlers <----- clear the notifier callback >> schedule vhost_user_async_close_bh >> >> The vhost->vdev is null, so the vhost_user_blk_disconnect will not be >> called, then the event fd callback will not be reinstalled. >> >> With this patch, the vhost_user_blk_disconnect will call the >> vhost_dev_cleanup() again, it's safe. >> >> All vhost-user devices have this issue, including vhost-user-blk/scsi. >> >> Fixes: 71e076a07d ("hw/virtio: generalise CHR_EVENT_CLOSED handling") >> >> Signed-off-by: Li Feng <fen...@smartx.com> >> --- >> hw/virtio/vhost-user.c | 10 +--------- >> 1 file changed, 1 insertion(+), 9 deletions(-) >> >> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c >> index 8dcf049d42..697b403fe2 100644 >> --- a/hw/virtio/vhost-user.c >> +++ b/hw/virtio/vhost-user.c >> @@ -2648,16 +2648,8 @@ typedef struct { >> static void vhost_user_async_close_bh(void *opaque) >> { >> VhostAsyncCallback *data = opaque; >> - struct vhost_dev *vhost = data->vhost; >> >> - /* >> - * If the vhost_dev has been cleared in the meantime there is >> - * nothing left to do as some other path has completed the >> - * cleanup. >> - */ >> - if (vhost->vdev) { >> - data->cb(data->dev); >> - } >> + data->cb(data->dev); >> >> g_free(data); >> } >> -- >> 2.41.0 >> >