I guess I'll apply this. Can you fix a typo in subject though?
Would also be nice if we had a test for this behaviour.


On Wed, Oct 05, 2022 at 04:17:30PM +0800, Hyman Huang wrote:
> Ping,
>    Hi, Michael and Jason, how does this patchset feel think? :)
>    Sorry if i made noise.
> 
> Yong
> 
> 在 2022/9/26 14:36, huang...@chinatelecom.cn 写道:
> > From: "Hyman Huang(黄勇)" <huang...@chinatelecom.cn>
> > 
> > This patchset aim to fix the unexpected negotiation features for
> > vhost-user netdev interface.
> > 
> > Steps to reproduce the issue:
> > Prepare a vm (CentOS 8 in my work scenario) with vhost-user
> > backend interface and configure qemu as server mode. So dpdk
> > would connect qemu's unix socket periodically.
> > 
> > 1. start vm in background and restart openvswitch service
> >     concurrently and repeatedly in the process of vm start.
> > 
> > 2. check if negotiated virtio features of port is "0x40000000" at
> >     dpdk side by executing:
> >     ovs-vsctl list interface | grep features | grep {port_socket_path}
> > 3. if features equals "0x40000000", go to the vm and check if sending
> >     arp package works, executing:
> >     arping {IP_ADDR}
> >     if vm interface is configured to boot with dhcp protocol, it
> >     would get no ip.
> > 
> > After doing above steps, we'll find the arping not work, the ovs on
> > host side has forwarded unexpected arp packages, which be added 0x0000
> > in the head of ethenet frame.  Though qemu report some error when
> > read/write cmd of vhost protocol during the process of vm start,
> > like the following:
> > 
> > "Failed to set msg fds"
> > "vhost VQ 0 ring restore failed: -22: Invalid argument (22)"
> > 
> > The vm does not stop or report more suggestive error message, it
> > seems that everthing is ok.
> > 
> > The root cause is that dpdk port negotiated nothing but only one
> > VHOST_USER_F_PROTOCOL_FEATURES feature with vhost-user interface at
> > qemu side, which is an unexpected behavior. qemu only load the
> > VHOST_USER_F_PROTOCOL_FEATURES when VHOST_USER_SET_FEATURES and loss
> > the guest features configured by front-end virtio driver using the
> > VIRTIO_PCI_COMMON_GF addr, which is stored in acked_features field
> > of struct vhost_dev.
> > 
> > To explain how the acked_features disappear, we may need to know the
> > lifecyle of acked_features in vhost_dev during feature negotiation.
> > 
> > 1. qemu init acked_features field of struct vhost_dev in vhost_net_init()
> >     by calling vhost_net_ack_features(), the init value fetched from
> >     acked_features field of struct NetVhostUserState, which is the backup
> >     role after vhost stopping or unix socket closed.
> >     In the first time, the acked_features of struct NetVhostUserState is 0
> >     so the init value of vhost_dev's acked_features also 0.
> > 
> > 2. when guest virtio driver set features, qemu accept the features and
> >     call virtio_set_features to store the features as acked_features in
> >     vhost_dev.
> > 
> > 3. when unix socket closed or vhost_dev device doesn't work and be
> >     stopped unexpectedly, qemu will call chr_closed_bh or vhost_user_stop,
> >     which will copy acked_features from vhost_dev to NetVhostUserState and
> >     cleanup the vhost_dev. Since virtio driver not allowed to set features
> >     once status of virtio device changes to VIRTIO_CONFIG_S_FEATURE_OK,
> >     qemu need to backup it in case of loss.
> > 4. once unix socket return to normal and get connected, qemu will
> >     call vhost_user_start to restore the vhost_dev and fetch the
> >     acked_features stored in NetVhostUserState previously.
> > 
> > The above flow works fine in the normal scenarios, but it doesn't cover
> > the scenario that openvswitch service restart in the same time of
> > virtio features negotiation.
> > 
> > Let's analyze such scenario:
> >         qemu                                 dpdk
> > 
> >     vhost_net_init()
> >           |                      systemctl stop openvswitch.service
> >     virtio_set_features()                     |
> >           |                      systemctl start openvswitch.service
> >     virtio_set_status()
> > 
> > Ovs stop service before guset setting virtio features, chr_closed_bh()
> > be called and fetch acked_features in vhost_dev, if may store the
> > incomplete features to NetVhostUserState since it doesn't include
> > guest features, eg "0x40000000".
> > 
> > Guest set virtio features with another features, eg "0x7060a782",
> > this value will store in acked_features of vhost_dev, which is the
> > right and up-to-date features.
> > 
> > After ovs service show up, qemu unix socket get connected and call
> > vhost_user_start(), which will restore acked_features of vhost_dev
> > using NetVhostUserState and "0x40000000" be loaded, which is obsolete.
> > 
> > Guest set virtio device status and therefore qemu call
> > virtio_net_vhost_status finally, checking if vhost-net device has
> > started, start it if not, consequently the obsolete acked_features
> > "0x40000000" be negotiated after calling vhost_dev_set_features().
> > 
> > So the key point of solving this issue making the acked_features
> > in NetVhostUserState up-to-date, these patchset provide this
> > solution.
> > 
> > [PATCH 1/2]: Abstract the existing code of saving acked_features
> >               into vhost_user_save_acked_features so the next
> >               patch seems clean.
> > 
> > [PATCH 2/2]: Save the acked_features to NetVhostUserState once
> >               Guest virtio driver configured. This step makes
> >               acked_features in NetVhostUserState up-to-date.
> > 
> > Please review, any comments and suggestions are welcome.
> > 
> > Best regard.
> > 
> > Yong
> > 
> > Hyman Huang (2):
> >    vhost-user: Refactor vhost acked features saving
> >    vhost-net: Fix the virito features negotiation flaw
> > 
> >   hw/net/vhost_net.c       |  9 +++++++++
> >   hw/net/virtio-net.c      |  5 +++++
> >   include/net/vhost-user.h |  2 ++
> >   include/net/vhost_net.h  |  2 ++
> >   net/vhost-user.c         | 35 +++++++++++++++++++----------------
> >   5 files changed, 37 insertions(+), 16 deletions(-)
> > 
> 
> -- 
> Best regard
> 
> Hyman Huang(黄勇)


Reply via email to