Hi yajun, I have submitted a patch to fix this problem a few months ago, but in the end this solution was not accepted and other solutions were adopted to fix it.
https://lore.kernel.org/all/20230804052954.2918915-2-fen...@smartx.com/ This is the merged fix: https://lore.kernel.org/all/a68c0148e9bf105f9e83ff5e763b8fcb6f7ba9be.1697644299.git....@redhat.com/ Thanks, Li > 2024年4月1日 10:08,Yajun Wu <yaj...@nvidia.com> 写道: > > > On 3/27/2024 6:47 PM, Stefano Garzarella wrote: >> External email: Use caution opening links or attachments >> >> >> Hi Yajun, >> >> On Mon, Mar 25, 2024 at 10:54:13AM +0000, Yajun Wu wrote: >>> Hi experts, >>> >>> With latest QEMU (8.2.90), we find two vhost-user-blk backend reconnect >>> failure scenarios: >> Do you know if has it ever worked and so it's a regression, or have we >> always had this problem? > > I am afraid this commit: "71e076a07d (2022-12-01 02:30:13 -0500) hw/virtio: > generalise CHR_EVENT_CLOSED handling" caused both failures. Previous hash is > good. > > I suspect the "if (vhost->vdev)" in vhost_user_async_close_bh is the cause, > previous code doesn't have this check? > >> >> Thanks, >> Stefano >> >>> 1. Disconnect vhost-user-blk backend before guest driver probe vblk device, >>> then reconnect backend after guest driver probe device. QEMU won't send out >>> any vhost messages to restore backend. >>> This is because vhost->vdev is NULL before guest driver probe vblk device, >>> so vhost_user_blk_disconnect won't be called, s->connected is still true. >>> Next vhost_user_blk_connect will simply return without doing anything. >>> >>> 2. modprobe -r virtio-blk inside VM, then disconnect backend, then >>> reconnect backend, then modprobe virtio-blk. QEMU won't send messages in >>> vhost_dev_init. >>> This is because rmmod will let qemu call vhost_user_blk_stop, vhost->vdev >>> also become NULL(in vhost_dev_stop), vhost_user_blk_disconnect won't be >>> called. Again s->connected is still true, even chr connect is closed. >>> >>> I think even vhost->vdev is NULL, vhost_user_blk_disconnect should be >>> called when chr connect close? >>> Hope we can have a fix soon. >>> >>> >>> Thanks, >>> Yajun >>>