On 4/1/2024 4:34 PM, Li Feng wrote:
*External email: Use caution opening links or attachments*
Hi yajun,
I have submitted a patch to fix this problem a few months ago, but in
the end this solution was not accepted and other solutions
were adopted to fix it.
[PATCH 1/2] vhost-user: fix lost reconnect - Li Feng
<https://lore.kernel.org/all/20230804052954.2918915-2-fen...@smartx.com/>
lore.kernel.org
<https://lore.kernel.org/all/20230804052954.2918915-2-fen...@smartx.com/>
<https://lore.kernel.org/all/20230804052954.2918915-2-fen...@smartx.com/>
<https://lore.kernel.org/all/20230804052954.2918915-2-fen...@smartx.com/>
I think this fix is valid.
This is the merged fix:
[PULL 76/83] vhost-user: fix lost reconnect - Michael S. Tsirkin
<https://lore.kernel.org/all/a68c0148e9bf105f9e83ff5e763b8fcb6f7ba9be.1697644299.git....@redhat.com/>
lore.kernel.org
<https://lore.kernel.org/all/a68c0148e9bf105f9e83ff5e763b8fcb6f7ba9be.1697644299.git....@redhat.com/>
<https://lore.kernel.org/all/a68c0148e9bf105f9e83ff5e763b8fcb6f7ba9be.1697644299.git....@redhat.com/>
<https://lore.kernel.org/all/a68c0148e9bf105f9e83ff5e763b8fcb6f7ba9be.1697644299.git....@redhat.com/>
My tests are with this fix, failed in the two scenarios I mentioned.
Thanks,
Li
2024年4月1日 10:08,Yajun Wu <yaj...@nvidia.com> 写道:
On 3/27/2024 6:47 PM, Stefano Garzarella wrote:
External email: Use caution opening links or attachments
Hi Yajun,
On Mon, Mar 25, 2024 at 10:54:13AM +0000, Yajun Wu wrote:
Hi experts,
With latest QEMU (8.2.90), we find two vhost-user-blk backend reconnect
failure scenarios:
Do you know if has it ever worked and so it's a regression, or have we
always had this problem?
I am afraid this commit: "71e076a07d (2022-12-01 02:30:13 -0500)
hw/virtio: generalise CHR_EVENT_CLOSED handling" caused both
failures. Previous hash is good.
I suspect the "if (vhost->vdev)" in vhost_user_async_close_bh is the
cause, previous code doesn't have this check?
Thanks,
Stefano
1. Disconnect vhost-user-blk backend before guest driver probe vblk
device, then reconnect backend after guest driver probe device.
QEMU won't send out any vhost messages to restore backend.
This is because vhost->vdev is NULL before guest driver probe vblk
device, so vhost_user_blk_disconnect won't be called, s->connected
is still true. Next vhost_user_blk_connect will simply return
without doing anything.
2. modprobe -r virtio-blk inside VM, then disconnect backend, then
reconnect backend, then modprobe virtio-blk. QEMU won't send
messages in vhost_dev_init.
This is because rmmod will let qemu call vhost_user_blk_stop,
vhost->vdev also become NULL(in vhost_dev_stop),
vhost_user_blk_disconnect won't be called. Again s->connected is
still true, even chr connect is closed.
I think even vhost->vdev is NULL, vhost_user_blk_disconnect should
be called when chr connect close?
Hope we can have a fix soon.
Thanks,
Yajun