On 4/1/2024 4:34 PM, Li Feng wrote:
*External email: Use caution opening links or attachments*


Hi yajun,

I have submitted a patch to fix this problem a few months ago, but in the end this solution was not accepted and other solutions
were adopted to fix it.

[PATCH 1/2] vhost-user: fix lost reconnect - Li Feng <https://lore.kernel.org/all/20230804052954.2918915-2-fen...@smartx.com/> lore.kernel.org <https://lore.kernel.org/all/20230804052954.2918915-2-fen...@smartx.com/>
        
<https://lore.kernel.org/all/20230804052954.2918915-2-fen...@smartx.com/>

<https://lore.kernel.org/all/20230804052954.2918915-2-fen...@smartx.com/>

I think this fix is valid.

This is the merged fix:


[PULL 76/83] vhost-user: fix lost reconnect - Michael S. Tsirkin <https://lore.kernel.org/all/a68c0148e9bf105f9e83ff5e763b8fcb6f7ba9be.1697644299.git....@redhat.com/> lore.kernel.org <https://lore.kernel.org/all/a68c0148e9bf105f9e83ff5e763b8fcb6f7ba9be.1697644299.git....@redhat.com/>
        
<https://lore.kernel.org/all/a68c0148e9bf105f9e83ff5e763b8fcb6f7ba9be.1697644299.git....@redhat.com/>

<https://lore.kernel.org/all/a68c0148e9bf105f9e83ff5e763b8fcb6f7ba9be.1697644299.git....@redhat.com/>

My tests are with this fix, failed in the two scenarios I mentioned.


Thanks,
Li

2024年4月1日 10:08,Yajun Wu <yaj...@nvidia.com> 写道:


On 3/27/2024 6:47 PM, Stefano Garzarella wrote:
External email: Use caution opening links or attachments


Hi Yajun,

On Mon, Mar 25, 2024 at 10:54:13AM +0000, Yajun Wu wrote:
Hi experts,

With latest QEMU (8.2.90), we find two vhost-user-blk backend reconnect
failure scenarios:
Do you know if has it ever worked and so it's a regression, or have we
always had this problem?

I am afraid this commit: "71e076a07d (2022-12-01 02:30:13 -0500) hw/virtio: generalise CHR_EVENT_CLOSED handling"  caused both failures. Previous hash is good.

I suspect the "if (vhost->vdev)" in vhost_user_async_close_bh is the cause, previous code doesn't have this check?


Thanks,
Stefano

1. Disconnect vhost-user-blk backend before guest driver probe vblk device, then reconnect backend after guest driver probe device. QEMU won't send out any vhost messages to restore backend. This is because vhost->vdev is NULL before guest driver probe vblk device, so vhost_user_blk_disconnect won't be called, s->connected is still true. Next vhost_user_blk_connect will simply return without doing anything.

2. modprobe -r virtio-blk inside VM, then disconnect backend, then reconnect backend, then modprobe virtio-blk. QEMU won't send messages in vhost_dev_init. This is because rmmod will let qemu call vhost_user_blk_stop, vhost->vdev also become NULL(in vhost_dev_stop), vhost_user_blk_disconnect won't be called. Again s->connected is still true, even chr connect is closed.

I think even vhost->vdev is NULL, vhost_user_blk_disconnect should be called when chr connect close?
Hope we can have a fix soon.


Thanks,
Yajun

Reply via email to