I believe I faced this bug too on Xenial with kernel 4.10.0-26-generic on amd64. We use multiple multiqueue virtual adapters per VM so only partial connectivity loss occurs.
With systemtap I can see that on host last_used_event from patch equals to 0x0 on one of the queues, and guest receiving 0x1 as return value for that queue from __dev_queue_xmit which I beleve is NET_XMIT_DROP. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1711251 Title: vhost guest network randomly drops under stress (kvm) Status in The Ubuntu-power-systems project: In Progress Status in linux package in Ubuntu: In Progress Status in linux source package in Zesty: In Progress Bug description: == SRU Justification == A vhost performance patch was introduced in the 4.10 kernel upstream, and is currently included in the Zesty 4.10 kernel: commit 809ecb9bca6a9424ccd392d67e368160f8b76c92 Author: Jason Wang <jasow...@redhat.com> Date: Mon Dec 12 14:46:49 2016 +0800 vhost: cache used event for better performance -- However I recently hit a functional issue linked to this patch which would cause random guests to lose their network connection under stress. This is not architecture specific and more likely to be hit with high network stress (i.e. lots of uperf instances). The patch author has now reverted this patch upstream: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/vhost?id=8d65843c44269c21e95c98090d9bb4848d473853 which reads: " Revert "vhost: cache used event for better performance" This reverts commit 809ecb9bca6a9424ccd392d67e368160f8b76c92. Since it was reported to break vhost_net. We want to cache used event and use it to check for notification. The assumption was that guest won't move the event idx back, but this could happen in fact when 16 bit index wraps around after 64K entries. Signed-off-by: Jason Wang <jasow...@redhat.com> Acked-by: Michael S. Tsirkin <m...@redhat.com> Signed-off-by: David S. Miller <da...@davemloft.net> " I am requesting this patch to revert the problematic one be pulled into Ubuntu Zesty (anything 4.10+). ---uname output--- Linux p82qvirt 4.10.0-32-generic #36~16.04.1-Ubuntu SMP Wed Aug 9 09:19:19 UTC 2017 ppc64le ppc64le ppc64le GNU/Linux Machine Type = 8247-22L ---Steps to Reproduce--- I can recreate the scenario with the following setup: - on a 20core host, start 20 1core VMs - I have a single linux bridge assigned to all guests using virtio - start a uperf benchmark between each guest pair (10 total) using a high number of uperf nprocs (32) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/1711251/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp