On 6/7/23 08:48, Xia, Chenbo wrote:
Hi Maxime,
-----Original Message-----
From: Maxime Coquelin <maxime.coque...@redhat.com>
Sent: Tuesday, June 6, 2023 4:18 PM
To: dev@dpdk.org; Xia, Chenbo <chenbo....@intel.com>;
david.march...@redhat.com; m...@redhat.com; f...@redhat.com;
jasow...@redhat.com; Liang, Cunming <cunming.li...@intel.com>; Xie, Yongji
<xieyon...@bytedance.com>; echau...@redhat.com; epere...@redhat.com;
amore...@redhat.com; l...@redhat.com
Cc: Maxime Coquelin <maxime.coque...@redhat.com>
Subject: [PATCH v5 00/26] Add VDUSE support to Vhost library
This series introduces a new type of backend, VDUSE,
to the Vhost library.
VDUSE stands for vDPA device in Userspace, it enables
implementing a Virtio device in userspace and have it
attached to the Kernel vDPA bus.
Once attached to the vDPA bus, it can be used either by
Kernel Virtio drivers, like virtio-net in our case, via
the virtio-vdpa driver. Doing that, the device is visible
to the Kernel networking stack and is exposed to userspace
as a regular netdev.
It can also be exposed to userspace thanks to the
vhost-vdpa driver, via a vhost-vdpa chardev that can be
passed to QEMU or Virtio-user PMD.
While VDUSE support is already available in upstream
Kernel, a couple of patches are required to support
network device type:
https://gitlab.com/mcoquelin/linux/-/tree/vduse_networking_rfc
In order to attach the created VDUSE device to the vDPA
bus, a recent iproute2 version containing the vdpa tool is
required.
Benchmark results:
==================
On this v2, PVP reference benchmark has been run & compared with
Vhost-user.
When doing macswap forwarding in the worload, no difference is seen.
When doing io forwarding in the workload, we see 4% performance
degradation with VDUSE, comapred to Vhost-user/Virtio-user. It is
explained by the use of the IOTLB layer in the Vhost-library when using
VDUSE, whereas Vhost-user/Virtio-user does not make use of it.
Usage:
======
1. Probe required Kernel modules
# modprobe vdpa
# modprobe vduse
# modprobe virtio-vdpa
2. Build (require vduse kernel headers to be available)
# meson build
# ninja -C build
3. Create a VDUSE device (vduse0) using Vhost PMD with
testpmd (with 4 queue pairs in this example)
# ./build/app/dpdk-testpmd --no-pci --
vdev=net_vhost0,iface=/dev/vduse/vduse0,queues=4 --log-level=*:9 -- -i --
txq=4 --rxq=4
4. Attach the VDUSE device to the vDPA bus
# vdpa dev add name vduse0 mgmtdev vduse
=> The virtio-net netdev shows up (eth0 here)
# ip l show eth0
21: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP
mode DEFAULT group default qlen 1000
link/ether c2:73:ea:a7:68:6d brd ff:ff:ff:ff:ff:ff
5. Start/stop traffic in testpmd
testpmd> start
testpmd> show port stats 0
######################## NIC statistics for port 0
########################
RX-packets: 11 RX-missed: 0 RX-bytes: 1482
RX-errors: 0
RX-nombuf: 0
TX-packets: 1 TX-errors: 0 TX-bytes: 62
Throughput (since last show)
Rx-pps: 0 Rx-bps: 0
Tx-pps: 0 Tx-bps: 0
##########################################################################
##
testpmd> stop
6. Detach the VDUSE device from the vDPA bus
# vdpa dev del vduse0
7. Quit testpmd
testpmd> quit
Known issues & remaining work:
==============================
- Fix issue in FD manager (still polling while FD has been removed)
- Add Netlink support in Vhost library
- Support device reconnection
-> a temporary patch to support reconnection via a tmpfs file is
available,
upstream solution would be in-kernel and is being developed.
-> https://gitlab.com/mcoquelin/dpdk-next-virtio/-
/commit/5ad06ce14159a9ce36ee168dd13ef389cec91137
- Support packed ring
- Provide more performance benchmark results
Changes in v5:
==============
- Delay starting/stopping the device to after having replied to the VDUSE
event in order to avoid a deadlock encountered when testing with OVS.
Could you explain more to help me understand the deadlock issue?
Sure.
The V5 fixes an ABBA deadlock involving OVS mutex and kernel
rtnl_lock(), two OVS threads and the vdpa tool process.
We have an OVS bridge with a mlx5 port already added.
We add the vduse port to the same bridge.
Then we use the iproute2 vdpa tool to attach the vduse device the the
kernel vdpa bus. when doing this the rtnl lock is taken when the virtio-
net device is probed, and VDUSE_SET_STATUS gets sent and waits for its
reply.
This VDUSE_SET_STATUS request is handled by the DPDK VDUSE event
handler, and if DRIVER_OK bit is set the Vhsot .new_device() callback is
called, which triggers a bridge reconfiguration.
On bridge reconfiguration, the mlx5 port takes the OVS mutex and
performs an ioctl() which tries to take the rtnl lock, but is is already
owned by the vdpa tool.
The vduse_events thread is stucked waiting for the OVS mutex, so the
reply to the VDUSE_SET_STATUS event is never sent, and the vdpa tool
process is stucked for 30 seconds, until a timeout happens.
When the timeourt happen, everything is unblocked, but the VDUSE device
has been marked as broken, and so not usable anymore.
I could reproduce and provide you the backtraces of the different
threads if you wish.
Anyway, I think it makes sense to perform the device startup after
having replied to VDUSE_SET_STATUS request, as it just mean the device
has taken into account the new status of the driver.
Hope it clarifies, let me know if you need more details.
Thanks,
Maxime
Thanks,
Chenbo
- Mention reconnection support lack in the release note.
Changes in v4:
==============
- Applied patch 1 and patch 2 from v3
- Rebased on top of Eelco series
- Fix coredump clear in IOTLB cache removal (David)
- Remove uneeded ret variable in vhost_vring_inject_irq (David)
- Fixed release note (David, Chenbo)
Changes in v2/v3:
=================
- Fixed mem_set_dump() parameter (patch 4)
- Fixed accidental comment change (patch 7, Chenbo)
- Change from __builtin_ctz to __builtin_ctzll (patch 9, Chenbo)
- move change from patch 12 to 13 (Chenbo)
- Enable locks annotation for control queue (Patch 17)
- Send control queue notification when used descriptors enqueued (Patch 17)
- Lock control queue IOTLB lock (Patch 17)
- Fix error path in virtio_net_ctrl_pop() (Patch 17, Chenbo)
- Set VDUSE dev FD as NONBLOCK (Patch 18)
- Enable more Virtio features (Patch 18)
- Remove calls to pthread_setcancelstate() (Patch 22)
- Add calls to fdset_pipe_notify() when adding and deleting FDs from a set
(Patch 22)
- Use RTE_DIM() to get requests string array size (Patch 22)
- Set reply result for IOTLB update message (Patch 25, Chenbo)
- Fix queues enablement with multiqueue (Patch 26)
- Move kickfd creation for better logging (Patch 26)
- Improve logging (Patch 26)
- Uninstall cvq kickfd in case of handler installation failure (Patch 27)
- Enable CVQ notifications once handler is installed (Patch 27)
- Don't advertise multiqueue and control queue if app only request single
queue pair (Patch 27)
- Add release notes
Maxime Coquelin (26):
vhost: fix IOTLB entries overlap check with previous entry
vhost: add helper of IOTLB entries coredump
vhost: add helper for IOTLB entries shared page check
vhost: don't dump unneeded pages with IOTLB
vhost: change to single IOTLB cache per device
vhost: add offset field to IOTLB entries
vhost: add page size info to IOTLB entry
vhost: retry translating IOVA after IOTLB miss
vhost: introduce backend ops
vhost: add IOTLB cache entry removal callback
vhost: add helper for IOTLB misses
vhost: add helper for interrupt injection
vhost: add API to set max queue pairs
net/vhost: use API to set max queue pairs
vhost: add control virtqueue support
vhost: add VDUSE device creation and destruction
vhost: add VDUSE callback for IOTLB miss
vhost: add VDUSE callback for IOTLB entry removal
vhost: add VDUSE callback for IRQ injection
vhost: add VDUSE events handler
vhost: add support for virtqueue state get event
vhost: add support for VDUSE status set event
vhost: add support for VDUSE IOTLB update event
vhost: add VDUSE device startup
vhost: add multiqueue support to VDUSE
vhost: add VDUSE device stop
doc/guides/prog_guide/vhost_lib.rst | 4 +
doc/guides/rel_notes/release_23_07.rst | 12 +
drivers/net/vhost/rte_eth_vhost.c | 3 +
lib/vhost/iotlb.c | 333 +++++++------
lib/vhost/iotlb.h | 45 +-
lib/vhost/meson.build | 5 +
lib/vhost/rte_vhost.h | 17 +
lib/vhost/socket.c | 72 ++-
lib/vhost/vduse.c | 646 +++++++++++++++++++++++++
lib/vhost/vduse.h | 33 ++
lib/vhost/version.map | 1 +
lib/vhost/vhost.c | 70 ++-
lib/vhost/vhost.h | 57 ++-
lib/vhost/vhost_user.c | 51 +-
lib/vhost/vhost_user.h | 2 +-
lib/vhost/virtio_net_ctrl.c | 286 +++++++++++
lib/vhost/virtio_net_ctrl.h | 10 +
17 files changed, 1409 insertions(+), 238 deletions(-)
create mode 100644 lib/vhost/vduse.c
create mode 100644 lib/vhost/vduse.h
create mode 100644 lib/vhost/virtio_net_ctrl.c
create mode 100644 lib/vhost/virtio_net_ctrl.h
--
2.40.1