On Wed, Apr 15, 2020 at 11:28:23AM +0800, Li Feng wrote: > > switch (event) { > case CHR_EVENT_OPENED: > @@ -363,7 +376,16 @@ static void vhost_user_blk_event(void *opaque, > QEMUChrEvent event) > } > break; > case CHR_EVENT_CLOSED: > - vhost_user_blk_disconnect(dev); > + /* > + * a close event may happen during a read/write, but vhost > + * code assumes the vhost_dev remains setup, so delay the > + * stop & clear to idle. > + */ > + ctx = qemu_get_current_aio_context(); > + > + qemu_chr_fe_set_handlers(&s->chardev, NULL, NULL, NULL, > + NULL, NULL, NULL, false); > + aio_bh_schedule_oneshot(ctx, vhost_user_blk_chr_closed_bh, opaque);
This seems a bit racy. What’s to stop the async operation from executing before the next read? If the issue is just that the vhost_dev state is being destroyed too early, why don’t we rather move the vhost_dev_cleanup() call from vhost_user_blk_disconnect() to vhost_user_blk_connect()? We may need to add some state to tell if this is the first connect or a reconnect so we don’t call vhost_dev_cleanup() on initial connect, but as long as we call vhost_dev_cleanup() before vhost_dev_init() every time the device reconnects it shouldn’t matter that we keep that state around. Thoughts? > break; > case CHR_EVENT_BREAK: > case CHR_EVENT_MUX_IN: