[PATCH 4/4] vdpa: Allow VIRTIO_NET_F_CTRL_VLAN in SVQ
Enable SVQ with VIRTIO_NET_F_CTRL_VLAN feature. Co-developed-by: Eugenio Pérez Signed-off-by: Eugenio Pérez Signed-off-by: Hawkins Jiawei --- net/vhost-vdpa.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c index 0787dd933b..dfd271c456 100644 --- a/net/vhost-vdpa.c +++ b/net/vhost-vdpa.c @@ -111,6 +111,7 @@ static const uint64_t vdpa_svq_device_features = BIT_ULL(VIRTIO_NET_F_STATUS) | BIT_ULL(VIRTIO_NET_F_CTRL_VQ) | BIT_ULL(VIRTIO_NET_F_CTRL_RX) | +BIT_ULL(VIRTIO_NET_F_CTRL_VLAN) | BIT_ULL(VIRTIO_NET_F_CTRL_RX_EXTRA) | BIT_ULL(VIRTIO_NET_F_MQ) | BIT_ULL(VIRTIO_F_ANY_LAYOUT) | -- 2.25.1
[PATCH 3/4] vdpa: Restore vlan filtering state
This patch introduces vhost_vdpa_net_load_single_vlan() and vhost_vdpa_net_load_vlan() to restore the vlan filtering state at device's startup. Co-developed-by: Eugenio Pérez Signed-off-by: Eugenio Pérez Signed-off-by: Hawkins Jiawei --- net/vhost-vdpa.c | 49 1 file changed, 49 insertions(+) diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c index 9795306742..0787dd933b 100644 --- a/net/vhost-vdpa.c +++ b/net/vhost-vdpa.c @@ -965,6 +965,51 @@ static int vhost_vdpa_net_load_rx(VhostVDPAState *s, return 0; } +static int vhost_vdpa_net_load_single_vlan(VhostVDPAState *s, + const VirtIONet *n, + uint16_t vid) +{ +const struct iovec data = { +.iov_base = &vid, +.iov_len = sizeof(vid), +}; +ssize_t dev_written = vhost_vdpa_net_load_cmd(s, VIRTIO_NET_CTRL_VLAN, + VIRTIO_NET_CTRL_VLAN_ADD, + &data, 1); +if (unlikely(dev_written < 0)) { +return dev_written; +} +if (unlikely(*s->status != VIRTIO_NET_OK)) { +return -EIO; +} + +return 0; +} + +static int vhost_vdpa_net_load_vlan(VhostVDPAState *s, +const VirtIONet *n) +{ +int r; + +if (!virtio_vdev_has_feature(&n->parent_obj, VIRTIO_NET_F_CTRL_VLAN)) { +return 0; +} + +for (int i = 0; i < MAX_VLAN >> 5; i++) { +for (int j = 0; n->vlans[i] && j <= 0x1f; j++) { +if (n->vlans[i] & (1U << j)) { +r = vhost_vdpa_net_load_single_vlan(s, n, (i << 5) + j); +if (unlikely(r != 0)) { +return r; +} +} +} +} + +return 0; +} + + static int vhost_vdpa_net_load(NetClientState *nc) { VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc); @@ -995,6 +1040,10 @@ static int vhost_vdpa_net_load(NetClientState *nc) if (unlikely(r)) { return r; } +r = vhost_vdpa_net_load_vlan(s, n); +if (unlikely(r)) { +return r; +} return 0; } -- 2.25.1
[PATCH 0/4] Vhost-vdpa Shadow Virtqueue VLAN support
This series enables shadowed CVQ to intercept VLAN commands through shadowed CVQ, update the virtio NIC device model so qemu send it in a migration, and the restore of that VLAN state in the destination. TestStep 1. test the migration using vp-vdpa device - For L0 guest, boot QEMU with two virtio-net-pci net device with `ctrl_vq`, `ctrl_vlan` features on, command line like: -device virtio-net-pci,disable-legacy=on,disable-modern=off, iommu_platform=on,mq=on,ctrl_vq=on,guest_announce=off, indirect_desc=off,queue_reset=off,ctrl_vlan=on,... - For L1 guest, apply the patch series and compile the source code, start QEMU with two vdpa device with svq mode on, enable the `ctrl_vq`, `ctrl_vlan` features on, command line like: -netdev type=vhost-vdpa,x-svq=true,... -device virtio-net-pci,mq=on,guest_announce=off,ctrl_vq=on, ctrl_vlan=on,... - For L2 source guest, run the following bash command: ```bash #!/bin/sh for idx in {1..4094} do ip link add link eth0 name vlan$idx type vlan id $idx done ``` - gdb attaches the L2 dest VM and break at the vhost_vdpa_net_load_single_vlan(), and execute the following gdbscript ```gdbscript ignore 1 4094 c ``` - Execute the live migration in L2 source monitor - Result * with this series, gdb can hit the breakpoint and continue the executing without triggering any error or warning. Eugenio Pérez (1): virtio-net: do not reset vlan filtering at set_features Hawkins Jiawei (3): virtio-net: Expose MAX_VLAN vdpa: Restore vlan filtering state vdpa: Allow VIRTIO_NET_F_CTRL_VLAN in SVQ hw/net/virtio-net.c| 6 +--- include/hw/virtio/virtio-net.h | 6 net/vhost-vdpa.c | 50 ++ 3 files changed, 57 insertions(+), 5 deletions(-) -- 2.25.1
[PATCH 1/4] virtio-net: do not reset vlan filtering at set_features
From: Eugenio Pérez This function is called after virtio_load, so all vlan configuration is lost in migration case. Just allow all the vlan-tagged packets if vlan is not configured, and trust device reset to clear all filtered vlans. Fixes: 0b1eaa8803 ("virtio-net: Do not filter VLANs without F_CTRL_VLAN") Signed-off-by: Eugenio Pérez Reviewed-by: Hawkins Jiawei Signed-off-by: Hawkins Jiawei --- hw/net/virtio-net.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index 7102ec4817..d20d5a63cd 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -1006,9 +1006,7 @@ static void virtio_net_set_features(VirtIODevice *vdev, uint64_t features) vhost_net_save_acked_features(nc->peer); } -if (virtio_has_feature(features, VIRTIO_NET_F_CTRL_VLAN)) { -memset(n->vlans, 0, MAX_VLAN >> 3); -} else { +if (!virtio_has_feature(features, VIRTIO_NET_F_CTRL_VLAN)) { memset(n->vlans, 0xff, MAX_VLAN >> 3); } -- 2.25.1
[PATCH 2/4] virtio-net: Expose MAX_VLAN
vhost-vdpa shadowed CVQ needs to know the maximum number of vlans supported by the virtio-net device, so QEMU can restore the VLAN state in a migration. Co-developed-by: Eugenio Pérez Signed-off-by: Eugenio Pérez Signed-off-by: Hawkins Jiawei --- hw/net/virtio-net.c| 2 -- include/hw/virtio/virtio-net.h | 6 ++ 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index d20d5a63cd..a32672039d 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -49,8 +49,6 @@ #define VIRTIO_NET_VM_VERSION11 -#define MAX_VLAN(1 << 12) /* Per 802.1Q definition */ - /* previously fixed value */ #define VIRTIO_NET_RX_QUEUE_DEFAULT_SIZE 256 #define VIRTIO_NET_TX_QUEUE_DEFAULT_SIZE 256 diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h index 5f5dcb4572..93f3bb5d97 100644 --- a/include/hw/virtio/virtio-net.h +++ b/include/hw/virtio/virtio-net.h @@ -38,6 +38,12 @@ OBJECT_DECLARE_SIMPLE_TYPE(VirtIONet, VIRTIO_NET) /* Maximum VIRTIO_NET_CTRL_MAC_TABLE_SET unicast + multicast entries. */ #define MAC_TABLE_ENTRIES64 +/* + * The maximum number of VLANs in the VLAN filter table + * added by VIRTIO_NET_CTRL_VLAN_ADD + */ +#define MAX_VLAN(1 << 12) /* Per 802.1Q definition */ + typedef struct virtio_net_conf { uint32_t txtimer; -- 2.25.1
[PATCH v3 0/8] vdpa: Send all CVQ state load commands in parallel
This patchset allows QEMU to delay polling and checking the device used buffer until either the SVQ is full or control commands shadow buffers are full, instead of polling and checking immediately after sending each SVQ control command, so that QEMU can send all the SVQ control commands in parallel, which have better performance improvement. I use vp_vdpa device to simulate vdpa device, and create 4094 VLANS in guest to build a test environment for sending multiple CVQ state load commands. This patch series can improve latency from 10023 us to 8697 us for about 4099 CVQ state load commands, about 0.32 us per command. Note that this patch should be based on patch "Vhost-vdpa Shadow Virtqueue VLAN support" at [1]. [1]. https://lists.gnu.org/archive/html/qemu-devel/2023-07/msg03719.html TestStep 1. regression testing using vp-vdpa device - For L0 guest, boot QEMU with two virtio-net-pci net device with `ctrl_vq`, `ctrl_rx`, `ctrl_rx_extra` features on, command line like: -device virtio-net-pci,disable-legacy=on,disable-modern=off, iommu_platform=on,mq=on,ctrl_vq=on,guest_announce=off, indirect_desc=off,queue_reset=off,ctrl_rx=on,ctrl_rx_extra=on,... - For L1 guest, apply the patch series and compile the source code, start QEMU with two vdpa device with svq mode on, enable the `ctrl_vq`, `ctrl_rx`, `ctrl_rx_extra` features on, command line like: -netdev type=vhost-vdpa,x-svq=true,... -device virtio-net-pci,mq=on,guest_announce=off,ctrl_vq=on, ctrl_rx=on,ctrl_rx_extra=on... - For L2 source guest, run the following bash command: ```bash #!/bin/sh for idx1 in {0..9} do for idx2 in {0..9} do for idx3 in {0..6} do ip link add macvlan$idx1$idx2$idx3 link eth0 address 4a:30:10:19:$idx1$idx2:1$idx3 type macvlan mode bridge ip link set macvlan$idx1$idx2$idx3 up done done done ``` - Execute the live migration in L2 source monitor - Result * with this series, QEMU should not trigger any error or warning. 2. perf using vp-vdpa device - For L0 guest, boot QEMU with two virtio-net-pci net device with `ctrl_vq`, `ctrl_vlan` features on, command line like: -device virtio-net-pci,disable-legacy=on,disable-modern=off, iommu_platform=on,mq=on,ctrl_vq=on,guest_announce=off, indirect_desc=off,queue_reset=off,ctrl_vlan=on,... - For L1 guest, apply the patch series, then apply an addtional patch to record the load time in microseconds as following: ```diff diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c index 6b958d6363..501b510fd2 100644 --- a/hw/net/vhost_net.c +++ b/hw/net/vhost_net.c @@ -295,7 +295,10 @@ static int vhost_net_start_one(struct vhost_net *net, } if (net->nc->info->load) { +int64_t start_us = g_get_monotonic_time(); r = net->nc->info->load(net->nc); +error_report("vhost_vdpa_net_load() = %ld us", + g_get_monotonic_time() - start_us); if (r < 0) { goto fail; } ``` - For L1 guest, compile the code, and start QEMU with two vdpa device with svq mode on, enable the `ctrl_vq`, `ctrl_vlan` features on, command line like: -netdev type=vhost-vdpa,x-svq=true,... -device virtio-net-pci,mq=on,guest_announce=off,ctrl_vq=on, ctrl_vlan=on... - For L2 source guest, run the following bash command: ```bash #!/bin/sh for idx in {1..4094} do ip link add link eth0 name vlan$idx type vlan id $idx done ``` - wait for some time, then execute the live migration in L2 source monitor - Result * with this series, QEMU should not trigger any warning or error except something like "vhost_vdpa_net_load() = 8697 us" * without this series, QEMU should not trigger any warning or error except something like "vhost_vdpa_net_load() = 10023 us" ChangeLog = v3: - refactor vhost_svq_poll() to accept cmds_in_flight suggested by Jason and Eugenio - refactor vhost_vdpa_net_cvq_add() to make control commands buffers is not tied to `s->cvq_cmd_out_buffer` and `s->status`, so we can reuse it suggested by Eugenio - poll and check when SVQ is full or control commands shadow buffers is full v2: https://lore.kernel.org/all/cover.1683371965.git.yin31...@gmail.com/ - recover accidentally deleted rows - remove extra newline - refactor `need_poll_len` to `cmds_in_flight` - return -EINVAL when vhost_svq_poll() return 0 or check on buffers written by device fails - change the type of `in_cursor`, and refactor the code for updating cursor - return directly when vhost_vdpa_net_load_{mac,mq}() returns a failure in vhost_vdpa_net_load() v1: https://lore.kernel.org/all/cover.1681732982.git.yin31...@gmail.com/ Hawkins Jiawei (8): vhost: Add argument to vhost_svq_poll() vdpa: Use iovec for vhost_vdpa_net_cvq_add() vhost: Expose vhost_svq_available_slots() vdpa: Avoid using vhost_vdpa_net_load_*() outside vhost_vdpa_net_load() vdpa: Check device ack
[PATCH v3 1/8] vhost: Add argument to vhost_svq_poll()
Next patches in this series will no longer perform an immediate poll and check of the device's used buffers for each CVQ state load command. Instead, they will send CVQ state load commands in parallel by polling multiple pending buffers at once. To achieve this, this patch refactoring vhost_svq_poll() to accept a new argument `num`, which allows vhost_svq_poll() to wait for the device to use multiple elements, rather than polling for a single element. Signed-off-by: Hawkins Jiawei --- hw/virtio/vhost-shadow-virtqueue.c | 36 ++ hw/virtio/vhost-shadow-virtqueue.h | 2 +- net/vhost-vdpa.c | 2 +- 3 files changed, 24 insertions(+), 16 deletions(-) diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c index 49e5aed931..e731b1d2ea 100644 --- a/hw/virtio/vhost-shadow-virtqueue.c +++ b/hw/virtio/vhost-shadow-virtqueue.c @@ -514,29 +514,37 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq, } /** - * Poll the SVQ for one device used buffer. + * Poll the SVQ to wait for the device to use the specified number + * of elements and return the total length written by the device. * * This function race with main event loop SVQ polling, so extra * synchronization is needed. * - * Return the length written by the device. + * @svq: The svq + * @num: The number of elements that need to be used */ -size_t vhost_svq_poll(VhostShadowVirtqueue *svq) +size_t vhost_svq_poll(VhostShadowVirtqueue *svq, size_t num) { -int64_t start_us = g_get_monotonic_time(); -uint32_t len = 0; +size_t len = 0; +uint32_t r; -do { -if (vhost_svq_more_used(svq)) { -break; -} +while (num--) { +int64_t start_us = g_get_monotonic_time(); -if (unlikely(g_get_monotonic_time() - start_us > 10e6)) { -return 0; -} -} while (true); +do { +if (vhost_svq_more_used(svq)) { +break; +} + +if (unlikely(g_get_monotonic_time() - start_us > 10e6)) { +return len; +} +} while (true); + +vhost_svq_get_buf(svq, &r); +len += r; +} -vhost_svq_get_buf(svq, &len); return len; } diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h index 6efe051a70..5bce67837b 100644 --- a/hw/virtio/vhost-shadow-virtqueue.h +++ b/hw/virtio/vhost-shadow-virtqueue.h @@ -119,7 +119,7 @@ void vhost_svq_push_elem(VhostShadowVirtqueue *svq, int vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg, size_t out_num, const struct iovec *in_sg, size_t in_num, VirtQueueElement *elem); -size_t vhost_svq_poll(VhostShadowVirtqueue *svq); +size_t vhost_svq_poll(VhostShadowVirtqueue *svq, size_t num); void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd); void vhost_svq_set_svq_call_fd(VhostShadowVirtqueue *svq, int call_fd); diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c index dfd271c456..d1dd140bf6 100644 --- a/net/vhost-vdpa.c +++ b/net/vhost-vdpa.c @@ -625,7 +625,7 @@ static ssize_t vhost_vdpa_net_cvq_add(VhostVDPAState *s, size_t out_len, * descriptor. Also, we need to take the answer before SVQ pulls by itself, * when BQL is released */ -return vhost_svq_poll(svq); +return vhost_svq_poll(svq, 1); } static ssize_t vhost_vdpa_net_load_cmd(VhostVDPAState *s, uint8_t class, -- 2.25.1
[PATCH v3 2/8] vdpa: Use iovec for vhost_vdpa_net_cvq_add()
Next patches in this series will no longer perform an immediate poll and check of the device's used buffers for each CVQ state load command. Consequently, there will be multiple pending buffers in the shadow VirtQueue, making it a must for every control command to have its own buffer. To achieve this, this patch refactor vhost_vdpa_net_cvq_add() to accept `struct iovec`, which eliminates the coupling of control commands to `s->cvq_cmd_out_buffer` and `s->status`, allowing them to use their own buffer. Signed-off-by: Hawkins Jiawei --- net/vhost-vdpa.c | 38 -- 1 file changed, 20 insertions(+), 18 deletions(-) diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c index d1dd140bf6..6b16c8ece0 100644 --- a/net/vhost-vdpa.c +++ b/net/vhost-vdpa.c @@ -596,22 +596,14 @@ static void vhost_vdpa_net_cvq_stop(NetClientState *nc) vhost_vdpa_net_client_stop(nc); } -static ssize_t vhost_vdpa_net_cvq_add(VhostVDPAState *s, size_t out_len, - size_t in_len) +static ssize_t vhost_vdpa_net_cvq_add(VhostVDPAState *s, + struct iovec *out_sg, size_t out_num, + struct iovec *in_sg, size_t in_num) { -/* Buffers for the device */ -const struct iovec out = { -.iov_base = s->cvq_cmd_out_buffer, -.iov_len = out_len, -}; -const struct iovec in = { -.iov_base = s->status, -.iov_len = sizeof(virtio_net_ctrl_ack), -}; VhostShadowVirtqueue *svq = g_ptr_array_index(s->vhost_vdpa.shadow_vqs, 0); int r; -r = vhost_svq_add(svq, &out, 1, &in, 1, NULL); +r = vhost_svq_add(svq, out_sg, out_num, in_sg, in_num, NULL); if (unlikely(r != 0)) { if (unlikely(r == -ENOSPC)) { qemu_log_mask(LOG_GUEST_ERROR, "%s: No space on device queue\n", @@ -637,6 +629,15 @@ static ssize_t vhost_vdpa_net_load_cmd(VhostVDPAState *s, uint8_t class, .cmd = cmd, }; size_t data_size = iov_size(data_sg, data_num); +/* Buffers for the device */ +struct iovec out = { +.iov_base = s->cvq_cmd_out_buffer, +.iov_len = sizeof(ctrl) + data_size, +}; +struct iovec in = { +.iov_base = s->status, +.iov_len = sizeof(*s->status), +}; assert(data_size < vhost_vdpa_net_cvq_cmd_page_len() - sizeof(ctrl)); @@ -647,8 +648,7 @@ static ssize_t vhost_vdpa_net_load_cmd(VhostVDPAState *s, uint8_t class, iov_to_buf(data_sg, data_num, 0, s->cvq_cmd_out_buffer + sizeof(ctrl), data_size); -return vhost_vdpa_net_cvq_add(s, data_size + sizeof(ctrl), - sizeof(virtio_net_ctrl_ack)); +return vhost_vdpa_net_cvq_add(s, &out, 1, &in, 1); } static int vhost_vdpa_net_load_mac(VhostVDPAState *s, const VirtIONet *n) @@ -1222,9 +1222,7 @@ static int vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq, struct iovec out = { .iov_base = s->cvq_cmd_out_buffer, }; -/* in buffer used for device model */ -const struct iovec in = { -.iov_base = &status, +struct iovec in = { .iov_len = sizeof(status), }; ssize_t dev_written = -EINVAL; @@ -1232,6 +1230,8 @@ static int vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq, out.iov_len = iov_to_buf(elem->out_sg, elem->out_num, 0, s->cvq_cmd_out_buffer, vhost_vdpa_net_cvq_cmd_page_len()); +/* In buffer used for the vdpa device */ +in.iov_base = s->status; ctrl = s->cvq_cmd_out_buffer; if (ctrl->class == VIRTIO_NET_CTRL_ANNOUNCE) { @@ -1260,7 +1260,7 @@ static int vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq, goto out; } } else { -dev_written = vhost_vdpa_net_cvq_add(s, out.iov_len, sizeof(status)); +dev_written = vhost_vdpa_net_cvq_add(s, &out, 1, &in, 1); if (unlikely(dev_written < 0)) { goto out; } @@ -1276,6 +1276,8 @@ static int vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq, } status = VIRTIO_NET_ERR; +/* In buffer used for the device model */ +in.iov_base = &status; virtio_net_handle_ctrl_iov(svq->vdev, &in, 1, &out, 1); if (status != VIRTIO_NET_OK) { error_report("Bad CVQ processing in model"); -- 2.25.1
[PATCH v3 4/8] vdpa: Avoid using vhost_vdpa_net_load_*() outside vhost_vdpa_net_load()
Next patches in this series will refactor vhost_vdpa_net_load_cmd() to iterate through the control commands shadow buffers, allowing QEMU to send CVQ state load commands in parallel at device startup. Considering that QEMU always forwards the CVQ command serialized outside of vhost_vdpa_net_load(), it is more elegant to send the CVQ commands directly without invoking vhost_vdpa_net_load_*() helpers. Signed-off-by: Hawkins Jiawei --- net/vhost-vdpa.c | 17 ++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c index dd71008e08..ae8f59adaa 100644 --- a/net/vhost-vdpa.c +++ b/net/vhost-vdpa.c @@ -1098,12 +1098,14 @@ static NetClientInfo net_vhost_vdpa_cvq_info = { */ static int vhost_vdpa_net_excessive_mac_filter_cvq_add(VhostVDPAState *s, VirtQueueElement *elem, - struct iovec *out) + struct iovec *out, + struct iovec *in) { struct virtio_net_ctrl_mac mac_data, *mac_ptr; struct virtio_net_ctrl_hdr *hdr_ptr; uint32_t cursor; ssize_t r; +uint8_t on = 1; /* parse the non-multicast MAC address entries from CVQ command */ cursor = sizeof(*hdr_ptr); @@ -1151,7 +1153,16 @@ static int vhost_vdpa_net_excessive_mac_filter_cvq_add(VhostVDPAState *s, * filter table to the vdpa device, it should send the * VIRTIO_NET_CTRL_RX_PROMISC CVQ command to enable promiscuous mode */ -r = vhost_vdpa_net_load_rx_mode(s, VIRTIO_NET_CTRL_RX_PROMISC, 1); +cursor = 0; +hdr_ptr = out->iov_base; +out->iov_len = sizeof(*hdr_ptr) + sizeof(on); +assert(out->iov_len < vhost_vdpa_net_cvq_cmd_page_len()); + +hdr_ptr->class = VIRTIO_NET_CTRL_RX; +hdr_ptr->cmd = VIRTIO_NET_CTRL_RX_PROMISC; +cursor += sizeof(*hdr_ptr); +*(uint8_t *)(out->iov_base + cursor) = on; +r = vhost_vdpa_net_cvq_add(s, out, 1, in, 1); if (unlikely(r < 0)) { return r; } @@ -1264,7 +1275,7 @@ static int vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq, * the CVQ command direclty. */ dev_written = vhost_vdpa_net_excessive_mac_filter_cvq_add(s, elem, - &out); + &out, &in); if (unlikely(dev_written < 0)) { goto out; } -- 2.25.1
[PATCH v3 8/8] vdpa: Send cvq state load commands in parallel
This patch enables sending CVQ state load commands in parallel at device startup by following steps: * Refactor vhost_vdpa_net_load_cmd() to iterate through the control commands shadow buffers. This allows different CVQ state load commands to use their own unique buffers. * Delay the polling and checking of buffers until either the SVQ is full or control commands shadow buffers are full. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1578 Signed-off-by: Hawkins Jiawei --- net/vhost-vdpa.c | 157 +-- 1 file changed, 96 insertions(+), 61 deletions(-) diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c index 795c9c1fd2..1ebb58f7f6 100644 --- a/net/vhost-vdpa.c +++ b/net/vhost-vdpa.c @@ -633,6 +633,26 @@ static uint16_t vhost_vdpa_net_svq_available_slots(VhostVDPAState *s) return vhost_svq_available_slots(svq); } +/* + * Poll SVQ for multiple pending control commands and check the device's ack. + * + * Caller should hold the BQL when invoking this function. + */ +static ssize_t vhost_vdpa_net_svq_flush(VhostVDPAState *s, +size_t cmds_in_flight) +{ +vhost_vdpa_net_svq_poll(s, cmds_in_flight); + +/* Device should and must use only one byte ack each control command */ +assert(cmds_in_flight < vhost_vdpa_net_cvq_cmd_page_len()); +for (int i = 0; i < cmds_in_flight; ++i) { +if (s->status[i] != VIRTIO_NET_OK) { +return -EIO; +} +} +return 0; +} + static ssize_t vhost_vdpa_net_load_cmd(VhostVDPAState *s, void **out_cursor, void **in_cursor, uint8_t class, uint8_t cmd, const struct iovec *data_sg, @@ -642,19 +662,41 @@ static ssize_t vhost_vdpa_net_load_cmd(VhostVDPAState *s, void **out_cursor, .class = class, .cmd = cmd, }; -size_t data_size = iov_size(data_sg, data_num); +size_t data_size = iov_size(data_sg, data_num), + left_bytes = vhost_vdpa_net_cvq_cmd_page_len() - +(*out_cursor - s->cvq_cmd_out_buffer); /* Buffers for the device */ struct iovec out = { -.iov_base = *out_cursor, .iov_len = sizeof(ctrl) + data_size, }; struct iovec in = { -.iov_base = *in_cursor, .iov_len = sizeof(*s->status), }; ssize_t r; -assert(data_size < vhost_vdpa_net_cvq_cmd_page_len() - sizeof(ctrl)); +if (sizeof(ctrl) > left_bytes || data_size > left_bytes - sizeof(ctrl) || +vhost_vdpa_net_svq_available_slots(s) < 2) { +/* + * It is time to flush all pending control commands if SVQ is full + * or control commands shadow buffers are full. + * + * We can poll here since we've had BQL from the time + * we sent the descriptor. + */ +r = vhost_vdpa_net_svq_flush(s, *in_cursor - (void *)s->status); +if (unlikely(r < 0)) { +return r; +} + +*out_cursor = s->cvq_cmd_out_buffer; +*in_cursor = s->status; +left_bytes = vhost_vdpa_net_cvq_cmd_page_len(); +} + +out.iov_base = *out_cursor; +in.iov_base = *in_cursor; + +assert(data_size <= left_bytes - sizeof(ctrl)); /* Each CVQ command has one out descriptor and one in descriptor */ assert(vhost_vdpa_net_svq_available_slots(s) >= 2); @@ -670,11 +712,11 @@ static ssize_t vhost_vdpa_net_load_cmd(VhostVDPAState *s, void **out_cursor, return r; } -/* - * We can poll here since we've had BQL from the time - * we sent the descriptor. - */ -return vhost_vdpa_net_svq_poll(s, 1); +/* iterate the cursors */ +*out_cursor += out.iov_len; +*in_cursor += in.iov_len; + +return 0; } static int vhost_vdpa_net_load_mac(VhostVDPAState *s, const VirtIONet *n, @@ -685,15 +727,12 @@ static int vhost_vdpa_net_load_mac(VhostVDPAState *s, const VirtIONet *n, .iov_base = (void *)n->mac, .iov_len = sizeof(n->mac), }; -ssize_t dev_written = vhost_vdpa_net_load_cmd(s, out_cursor, in_cursor, - VIRTIO_NET_CTRL_MAC, - VIRTIO_NET_CTRL_MAC_ADDR_SET, - &data, 1); -if (unlikely(dev_written < 0)) { -return dev_written; -} -if (*s->status != VIRTIO_NET_OK) { -return -EIO; +ssize_t r = vhost_vdpa_net_load_cmd(s, out_cursor, in_cursor, + VIRTIO_NET_CTRL_MAC, + VIRTIO_NET_CTRL_MAC_ADDR_SET, + &data, 1); +if (unlikely(r < 0)) { +return r; } } @@ -738,15 +
[PATCH v3 3/8] vhost: Expose vhost_svq_available_slots()
Next patches in this series will delay the polling and checking of buffers until either the SVQ is full or control commands shadow buffers are full, no longer perform an immediate poll and check of the device's used buffers for each CVQ state load command. To achieve this, this patch exposes vhost_svq_available_slots() and introduces a helper function, allowing QEMU to know whether the SVQ is full. Signed-off-by: Hawkins Jiawei --- hw/virtio/vhost-shadow-virtqueue.c | 2 +- hw/virtio/vhost-shadow-virtqueue.h | 1 + net/vhost-vdpa.c | 9 + 3 files changed, 11 insertions(+), 1 deletion(-) diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c index e731b1d2ea..fc5f408f77 100644 --- a/hw/virtio/vhost-shadow-virtqueue.c +++ b/hw/virtio/vhost-shadow-virtqueue.c @@ -66,7 +66,7 @@ bool vhost_svq_valid_features(uint64_t features, Error **errp) * * @svq: The svq */ -static uint16_t vhost_svq_available_slots(const VhostShadowVirtqueue *svq) +uint16_t vhost_svq_available_slots(const VhostShadowVirtqueue *svq) { return svq->num_free; } diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h index 5bce67837b..19c842a15b 100644 --- a/hw/virtio/vhost-shadow-virtqueue.h +++ b/hw/virtio/vhost-shadow-virtqueue.h @@ -114,6 +114,7 @@ typedef struct VhostShadowVirtqueue { bool vhost_svq_valid_features(uint64_t features, Error **errp); +uint16_t vhost_svq_available_slots(const VhostShadowVirtqueue *svq); void vhost_svq_push_elem(VhostShadowVirtqueue *svq, const VirtQueueElement *elem, uint32_t len); int vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg, diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c index 6b16c8ece0..dd71008e08 100644 --- a/net/vhost-vdpa.c +++ b/net/vhost-vdpa.c @@ -620,6 +620,13 @@ static ssize_t vhost_vdpa_net_cvq_add(VhostVDPAState *s, return vhost_svq_poll(svq, 1); } +/* Convenience wrapper to get number of available SVQ descriptors */ +static uint16_t vhost_vdpa_net_svq_available_slots(VhostVDPAState *s) +{ +VhostShadowVirtqueue *svq = g_ptr_array_index(s->vhost_vdpa.shadow_vqs, 0); +return vhost_svq_available_slots(svq); +} + static ssize_t vhost_vdpa_net_load_cmd(VhostVDPAState *s, uint8_t class, uint8_t cmd, const struct iovec *data_sg, size_t data_num) @@ -640,6 +647,8 @@ static ssize_t vhost_vdpa_net_load_cmd(VhostVDPAState *s, uint8_t class, }; assert(data_size < vhost_vdpa_net_cvq_cmd_page_len() - sizeof(ctrl)); +/* Each CVQ command has one out descriptor and one in descriptor */ +assert(vhost_vdpa_net_svq_available_slots(s) >= 2); /* pack the CVQ command header */ memcpy(s->cvq_cmd_out_buffer, &ctrl, sizeof(ctrl)); -- 2.25.1
[PATCH v3 5/8] vdpa: Check device ack in vhost_vdpa_net_load_rx_mode()
Considering that vhost_vdpa_net_load_rx_mode() is only called within vhost_vdpa_net_load_rx() now, this patch refactors vhost_vdpa_net_load_rx_mode() to include a check for the device's ack, simplifying the code and improving its maintainability. Signed-off-by: Hawkins Jiawei --- net/vhost-vdpa.c | 76 1 file changed, 31 insertions(+), 45 deletions(-) diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c index ae8f59adaa..fe0ba19724 100644 --- a/net/vhost-vdpa.c +++ b/net/vhost-vdpa.c @@ -814,14 +814,24 @@ static int vhost_vdpa_net_load_rx_mode(VhostVDPAState *s, .iov_base = &on, .iov_len = sizeof(on), }; -return vhost_vdpa_net_load_cmd(s, VIRTIO_NET_CTRL_RX, - cmd, &data, 1); +ssize_t dev_written; + +dev_written = vhost_vdpa_net_load_cmd(s, VIRTIO_NET_CTRL_RX, + cmd, &data, 1); +if (unlikely(dev_written < 0)) { +return dev_written; +} +if (*s->status != VIRTIO_NET_OK) { +return -EIO; +} + +return 0; } static int vhost_vdpa_net_load_rx(VhostVDPAState *s, const VirtIONet *n) { -ssize_t dev_written; +ssize_t r; if (!virtio_vdev_has_feature(&n->parent_obj, VIRTIO_NET_F_CTRL_RX)) { return 0; @@ -846,13 +856,9 @@ static int vhost_vdpa_net_load_rx(VhostVDPAState *s, * configuration only at live migration. */ if (!n->mac_table.uni_overflow && !n->promisc) { -dev_written = vhost_vdpa_net_load_rx_mode(s, -VIRTIO_NET_CTRL_RX_PROMISC, 0); -if (unlikely(dev_written < 0)) { -return dev_written; -} -if (*s->status != VIRTIO_NET_OK) { -return -EIO; +r = vhost_vdpa_net_load_rx_mode(s, VIRTIO_NET_CTRL_RX_PROMISC, 0); +if (unlikely(r < 0)) { +return r; } } @@ -874,13 +880,9 @@ static int vhost_vdpa_net_load_rx(VhostVDPAState *s, * configuration only at live migration. */ if (n->mac_table.multi_overflow || n->allmulti) { -dev_written = vhost_vdpa_net_load_rx_mode(s, -VIRTIO_NET_CTRL_RX_ALLMULTI, 1); -if (unlikely(dev_written < 0)) { -return dev_written; -} -if (*s->status != VIRTIO_NET_OK) { -return -EIO; +r = vhost_vdpa_net_load_rx_mode(s, VIRTIO_NET_CTRL_RX_ALLMULTI, 1); +if (unlikely(r < 0)) { +return r; } } @@ -899,13 +901,9 @@ static int vhost_vdpa_net_load_rx(VhostVDPAState *s, * configuration only at live migration. */ if (n->alluni) { -dev_written = vhost_vdpa_net_load_rx_mode(s, -VIRTIO_NET_CTRL_RX_ALLUNI, 1); -if (dev_written < 0) { -return dev_written; -} -if (*s->status != VIRTIO_NET_OK) { -return -EIO; +r = vhost_vdpa_net_load_rx_mode(s, VIRTIO_NET_CTRL_RX_ALLUNI, 1); +if (r < 0) { +return r; } } @@ -920,13 +918,9 @@ static int vhost_vdpa_net_load_rx(VhostVDPAState *s, * configuration only at live migration. */ if (n->nomulti) { -dev_written = vhost_vdpa_net_load_rx_mode(s, -VIRTIO_NET_CTRL_RX_NOMULTI, 1); -if (dev_written < 0) { -return dev_written; -} -if (*s->status != VIRTIO_NET_OK) { -return -EIO; +r = vhost_vdpa_net_load_rx_mode(s, VIRTIO_NET_CTRL_RX_NOMULTI, 1); +if (r < 0) { +return r; } } @@ -941,13 +935,9 @@ static int vhost_vdpa_net_load_rx(VhostVDPAState *s, * configuration only at live migration. */ if (n->nouni) { -dev_written = vhost_vdpa_net_load_rx_mode(s, -VIRTIO_NET_CTRL_RX_NOUNI, 1); -if (dev_written < 0) { -return dev_written; -} -if (*s->status != VIRTIO_NET_OK) { -return -EIO; +r = vhost_vdpa_net_load_rx_mode(s, VIRTIO_NET_CTRL_RX_NOUNI, 1); +if (r < 0) { +return r; } } @@ -962,13 +952,9 @@ static int vhost_vdpa_net_load_rx(VhostVDPAState *s, * configuration only at live migration. */ if (n->nobcast) { -dev_written = vhost_vdpa_net_load_rx_mode(s, -VIRTIO_NET_CTRL_RX_NOBCAST, 1); -if (dev_written < 0) { -return dev_written; -} -if (*s->status != VIRTIO_NET_OK) { -return -EIO; +r = vhost_vdpa_net_load_rx_mode(s, VIRTIO_NET_CTRL_RX_NOBCAST, 1); +if (r < 0) { +return r; } } -- 2.25.1
[PATCH v3 6/8] vdpa: Move vhost_svq_poll() to the caller of vhost_vdpa_net_cvq_add()
This patch moves vhost_svq_poll() to the caller of vhost_vdpa_net_cvq_add() and introduces a helper funtion. By making this change, next patches in this series is able to refactor vhost_vdpa_net_load_x() only to delay the polling and checking process until either the SVQ is full or control commands shadow buffers are full. Signed-off-by: Hawkins Jiawei --- net/vhost-vdpa.c | 50 ++-- 1 file changed, 40 insertions(+), 10 deletions(-) diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c index fe0ba19724..d06f38403f 100644 --- a/net/vhost-vdpa.c +++ b/net/vhost-vdpa.c @@ -609,15 +609,21 @@ static ssize_t vhost_vdpa_net_cvq_add(VhostVDPAState *s, qemu_log_mask(LOG_GUEST_ERROR, "%s: No space on device queue\n", __func__); } -return r; } -/* - * We can poll here since we've had BQL from the time we sent the - * descriptor. Also, we need to take the answer before SVQ pulls by itself, - * when BQL is released - */ -return vhost_svq_poll(svq, 1); +return r; +} + +/* + * Convenience wrapper to poll SVQ for multiple control commands. + * + * Caller should hold the BQL when invoking this function, and should take + * the answer before SVQ pulls by itself when BQL is released. + */ +static ssize_t vhost_vdpa_net_svq_poll(VhostVDPAState *s, size_t cmds_in_flight) +{ +VhostShadowVirtqueue *svq = g_ptr_array_index(s->vhost_vdpa.shadow_vqs, 0); +return vhost_svq_poll(svq, cmds_in_flight); } /* Convenience wrapper to get number of available SVQ descriptors */ @@ -645,6 +651,7 @@ static ssize_t vhost_vdpa_net_load_cmd(VhostVDPAState *s, uint8_t class, .iov_base = s->status, .iov_len = sizeof(*s->status), }; +ssize_t r; assert(data_size < vhost_vdpa_net_cvq_cmd_page_len() - sizeof(ctrl)); /* Each CVQ command has one out descriptor and one in descriptor */ @@ -657,7 +664,16 @@ static ssize_t vhost_vdpa_net_load_cmd(VhostVDPAState *s, uint8_t class, iov_to_buf(data_sg, data_num, 0, s->cvq_cmd_out_buffer + sizeof(ctrl), data_size); -return vhost_vdpa_net_cvq_add(s, &out, 1, &in, 1); +r = vhost_vdpa_net_cvq_add(s, &out, 1, &in, 1); +if (unlikely(r < 0)) { +return r; +} + +/* + * We can poll here since we've had BQL from the time + * we sent the descriptor. + */ +return vhost_vdpa_net_svq_poll(s, 1); } static int vhost_vdpa_net_load_mac(VhostVDPAState *s, const VirtIONet *n) @@ -1152,6 +1168,12 @@ static int vhost_vdpa_net_excessive_mac_filter_cvq_add(VhostVDPAState *s, if (unlikely(r < 0)) { return r; } + +/* + * We can poll here since we've had BQL from the time + * we sent the descriptor. + */ +vhost_vdpa_net_svq_poll(s, 1); if (*s->status != VIRTIO_NET_OK) { return sizeof(*s->status); } @@ -1266,10 +1288,18 @@ static int vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq, goto out; } } else { -dev_written = vhost_vdpa_net_cvq_add(s, &out, 1, &in, 1); -if (unlikely(dev_written < 0)) { +ssize_t r; +r = vhost_vdpa_net_cvq_add(s, &out, 1, &in, 1); +if (unlikely(r < 0)) { +dev_written = r; goto out; } + +/* + * We can poll here since we've had BQL from the time + * we sent the descriptor. + */ +dev_written = vhost_vdpa_net_svq_poll(s, 1); } if (unlikely(dev_written < sizeof(status))) { -- 2.25.1
[PATCH v3 7/8] vdpa: Add cursors to vhost_vdpa_net_loadx()
This patch adds `out_cursor` and `in_cursor` arguments to vhost_vdpa_net_loadx(). By making this change, next patches in this series can refactor vhost_vdpa_net_load_cmd() directly to iterate through the control commands shadow buffers, allowing QEMU to send CVQ state load commands in parallel at device startup. Signed-off-by: Hawkins Jiawei --- net/vhost-vdpa.c | 79 ++-- 1 file changed, 50 insertions(+), 29 deletions(-) diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c index d06f38403f..795c9c1fd2 100644 --- a/net/vhost-vdpa.c +++ b/net/vhost-vdpa.c @@ -633,7 +633,8 @@ static uint16_t vhost_vdpa_net_svq_available_slots(VhostVDPAState *s) return vhost_svq_available_slots(svq); } -static ssize_t vhost_vdpa_net_load_cmd(VhostVDPAState *s, uint8_t class, +static ssize_t vhost_vdpa_net_load_cmd(VhostVDPAState *s, void **out_cursor, + void **in_cursor, uint8_t class, uint8_t cmd, const struct iovec *data_sg, size_t data_num) { @@ -644,11 +645,11 @@ static ssize_t vhost_vdpa_net_load_cmd(VhostVDPAState *s, uint8_t class, size_t data_size = iov_size(data_sg, data_num); /* Buffers for the device */ struct iovec out = { -.iov_base = s->cvq_cmd_out_buffer, +.iov_base = *out_cursor, .iov_len = sizeof(ctrl) + data_size, }; struct iovec in = { -.iov_base = s->status, +.iov_base = *in_cursor, .iov_len = sizeof(*s->status), }; ssize_t r; @@ -658,11 +659,11 @@ static ssize_t vhost_vdpa_net_load_cmd(VhostVDPAState *s, uint8_t class, assert(vhost_vdpa_net_svq_available_slots(s) >= 2); /* pack the CVQ command header */ -memcpy(s->cvq_cmd_out_buffer, &ctrl, sizeof(ctrl)); +memcpy(out.iov_base, &ctrl, sizeof(ctrl)); /* pack the CVQ command command-specific-data */ iov_to_buf(data_sg, data_num, 0, - s->cvq_cmd_out_buffer + sizeof(ctrl), data_size); + out.iov_base + sizeof(ctrl), data_size); r = vhost_vdpa_net_cvq_add(s, &out, 1, &in, 1); if (unlikely(r < 0)) { @@ -676,14 +677,16 @@ static ssize_t vhost_vdpa_net_load_cmd(VhostVDPAState *s, uint8_t class, return vhost_vdpa_net_svq_poll(s, 1); } -static int vhost_vdpa_net_load_mac(VhostVDPAState *s, const VirtIONet *n) +static int vhost_vdpa_net_load_mac(VhostVDPAState *s, const VirtIONet *n, + void **out_cursor, void **in_cursor) { if (virtio_vdev_has_feature(&n->parent_obj, VIRTIO_NET_F_CTRL_MAC_ADDR)) { const struct iovec data = { .iov_base = (void *)n->mac, .iov_len = sizeof(n->mac), }; -ssize_t dev_written = vhost_vdpa_net_load_cmd(s, VIRTIO_NET_CTRL_MAC, +ssize_t dev_written = vhost_vdpa_net_load_cmd(s, out_cursor, in_cursor, + VIRTIO_NET_CTRL_MAC, VIRTIO_NET_CTRL_MAC_ADDR_SET, &data, 1); if (unlikely(dev_written < 0)) { @@ -735,7 +738,7 @@ static int vhost_vdpa_net_load_mac(VhostVDPAState *s, const VirtIONet *n) .iov_len = mul_macs_size, }, }; -ssize_t dev_written = vhost_vdpa_net_load_cmd(s, +ssize_t dev_written = vhost_vdpa_net_load_cmd(s, out_cursor, in_cursor, VIRTIO_NET_CTRL_MAC, VIRTIO_NET_CTRL_MAC_TABLE_SET, data, ARRAY_SIZE(data)); @@ -750,7 +753,8 @@ static int vhost_vdpa_net_load_mac(VhostVDPAState *s, const VirtIONet *n) } static int vhost_vdpa_net_load_mq(VhostVDPAState *s, - const VirtIONet *n) + const VirtIONet *n, + void **out_cursor, void **in_cursor) { struct virtio_net_ctrl_mq mq; ssize_t dev_written; @@ -764,7 +768,8 @@ static int vhost_vdpa_net_load_mq(VhostVDPAState *s, .iov_base = &mq, .iov_len = sizeof(mq), }; -dev_written = vhost_vdpa_net_load_cmd(s, VIRTIO_NET_CTRL_MQ, +dev_written = vhost_vdpa_net_load_cmd(s, out_cursor, in_cursor, + VIRTIO_NET_CTRL_MQ, VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET, &data, 1); if (unlikely(dev_written < 0)) { @@ -778,7 +783,8 @@ static int vhost_vdpa_net_load_mq(VhostVDPAState *s, } static int vhost_vdpa_net_load_offloads(VhostVDPAState *s, -const VirtIONet *n) +const VirtIONet *n, +void **out_cursor, void **in_cursor) { uint64_
Re: [PATCH 0/4] Vhost-vdpa Shadow Virtqueue VLAN support
在 2023/7/19 15:47, Hawkins Jiawei 写道: > This series enables shadowed CVQ to intercept VLAN commands > through shadowed CVQ, update the virtio NIC device model > so qemu send it in a migration, and the restore of that > VLAN state in the destination. This patch series is based on "[PATCH 0/3] Vhost-vdpa Shadow Virtqueue VLAN support" at [1], with these changes: - move `MAX_VLAN` macro to include/hw/virtio/virtio-net.h instead of net/vhost-vdpa.c - fix conflicts with the master branch Thanks! [1]. https://lists.gnu.org/archive/html/qemu-devel/2022-09/msg01016.html > > TestStep > > 1. test the migration using vp-vdpa device >- For L0 guest, boot QEMU with two virtio-net-pci net device with > `ctrl_vq`, `ctrl_vlan` features on, command line like: >-device virtio-net-pci,disable-legacy=on,disable-modern=off, > iommu_platform=on,mq=on,ctrl_vq=on,guest_announce=off, > indirect_desc=off,queue_reset=off,ctrl_vlan=on,... > >- For L1 guest, apply the patch series and compile the source code, > start QEMU with two vdpa device with svq mode on, enable the `ctrl_vq`, > `ctrl_vlan` features on, command line like: >-netdev type=vhost-vdpa,x-svq=true,... >-device virtio-net-pci,mq=on,guest_announce=off,ctrl_vq=on, > ctrl_vlan=on,... > >- For L2 source guest, run the following bash command: > ```bash > #!/bin/sh > > for idx in {1..4094} > do >ip link add link eth0 name vlan$idx type vlan id $idx > done > ``` > >- gdb attaches the L2 dest VM and break at the > vhost_vdpa_net_load_single_vlan(), and execute the following > gdbscript > ```gdbscript > ignore 1 4094 > c > ``` > >- Execute the live migration in L2 source monitor > >- Result > * with this series, gdb can hit the breakpoint and continue > the executing without triggering any error or warning. > > Eugenio Pérez (1): >virtio-net: do not reset vlan filtering at set_features > > Hawkins Jiawei (3): >virtio-net: Expose MAX_VLAN >vdpa: Restore vlan filtering state >vdpa: Allow VIRTIO_NET_F_CTRL_VLAN in SVQ > > hw/net/virtio-net.c| 6 +--- > include/hw/virtio/virtio-net.h | 6 > net/vhost-vdpa.c | 50 ++ > 3 files changed, 57 insertions(+), 5 deletions(-) >
Re: [PATCH v3 0/8] vdpa: Send all CVQ state load commands in parallel
在 2023/7/19 17:11, Michael S. Tsirkin 写道: > On Wed, Jul 19, 2023 at 03:53:45PM +0800, Hawkins Jiawei wrote: >> This patchset allows QEMU to delay polling and checking the device >> used buffer until either the SVQ is full or control commands shadow >> buffers are full, instead of polling and checking immediately after >> sending each SVQ control command, so that QEMU can send all the SVQ >> control commands in parallel, which have better performance improvement. >> >> I use vp_vdpa device to simulate vdpa device, and create 4094 VLANS in >> guest to build a test environment for sending multiple CVQ state load >> commands. This patch series can improve latency from 10023 us to >> 8697 us for about 4099 CVQ state load commands, about 0.32 us per command. > > Looks like a tiny improvement. > At the same time we have O(n^2) behaviour with memory mappings. Hi Michael, Thanks for your review. I wonder why you say "we have O(n^2) behaviour on memory mappings" here? From my understanding, QEMU maps two page-size buffers as control commands shadow buffers at device startup. These buffers then are used to cache SVQ control commands, where QEMU fills them with multiple SVQ control commands bytes, flushes them when SVQ descriptors are full or these control commands shadow buffers reach their capacity. QEMU repeats this process until all CVQ state load commands have been sent in loading. In this loading process, only control commands shadow buffers translation should be relative to memory mappings, which should be O(log n) behaviour to my understanding(Please correct me if I am wrong). > Not saying we must not do this but I think it's worth > checking where the bottleneck is. My guess would be > vp_vdpa is not doing things in parallel. Want to try fixing that As for "vp_vdpa is not doing things in parallel.", do you mean the vp_vdpa device cannot process QEMU's SVQ control commands in parallel? In this situation, I will try to use real vdpa hardware to test the patch series performance. > to see how far it can be pushed? Currently, I am involved in the "Add virtio-net Control Virtqueue state restore support" project in Google Summer of Code now. Because I am uncertain about the time it will take to fix that problem in the vp_vdpa device, I prefer to complete the gsoc project first. Thanks! > > >> Note that this patch should be based on >> patch "Vhost-vdpa Shadow Virtqueue VLAN support" at [1]. >> >> [1]. https://lists.gnu.org/archive/html/qemu-devel/2023-07/msg03719.html >> >> TestStep >> >> 1. regression testing using vp-vdpa device >>- For L0 guest, boot QEMU with two virtio-net-pci net device with >> `ctrl_vq`, `ctrl_rx`, `ctrl_rx_extra` features on, command line like: >>-device virtio-net-pci,disable-legacy=on,disable-modern=off, >> iommu_platform=on,mq=on,ctrl_vq=on,guest_announce=off, >> indirect_desc=off,queue_reset=off,ctrl_rx=on,ctrl_rx_extra=on,... >> >>- For L1 guest, apply the patch series and compile the source code, >> start QEMU with two vdpa device with svq mode on, enable the `ctrl_vq`, >> `ctrl_rx`, `ctrl_rx_extra` features on, command line like: >>-netdev type=vhost-vdpa,x-svq=true,... >>-device virtio-net-pci,mq=on,guest_announce=off,ctrl_vq=on, >> ctrl_rx=on,ctrl_rx_extra=on... >> >>- For L2 source guest, run the following bash command: >> ```bash >> #!/bin/sh >> >> for idx1 in {0..9} >> do >>for idx2 in {0..9} >>do >> for idx3 in {0..6} >> do >>ip link add macvlan$idx1$idx2$idx3 link eth0 >> address 4a:30:10:19:$idx1$idx2:1$idx3 type macvlan mode bridge >>ip link set macvlan$idx1$idx2$idx3 up >> done >>done >> done >> ``` >>- Execute the live migration in L2 source monitor >> >>- Result >> * with this series, QEMU should not trigger any error or warning. >> >> >> >> 2. perf using vp-vdpa device >>- For L0 guest, boot QEMU with two virtio-net-pci net device with >> `ctrl_vq`, `ctrl_vlan` features on, command line like: >>-device virtio-net-pci,disable-legacy=on,disable-modern=off, >> iommu_platform=on,mq=on,ctrl_vq=on,guest_announce=off, >> indirect_desc=off,queue_reset=off,ctrl_vlan=on,... >> >>- For L1 guest, apply the patch series, then apply an addtional >> patch to record the load time in microseconds as following: >> ```diff >> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c >> index 6b958d6363..501b510fd2 100644 >> --- a/hw/net/vhost_net.c >> +++ b/hw/net/vhost_net.c >
Re: [PATCH v3 0/8] vdpa: Send all CVQ state load commands in parallel
在 2023/7/19 20:46, Michael S. Tsirkin 写道: > On Wed, Jul 19, 2023 at 08:35:50PM +0800, Hawkins Jiawei wrote: >> 在 2023/7/19 17:11, Michael S. Tsirkin 写道: >>> On Wed, Jul 19, 2023 at 03:53:45PM +0800, Hawkins Jiawei wrote: >>>> This patchset allows QEMU to delay polling and checking the device >>>> used buffer until either the SVQ is full or control commands shadow >>>> buffers are full, instead of polling and checking immediately after >>>> sending each SVQ control command, so that QEMU can send all the SVQ >>>> control commands in parallel, which have better performance improvement. >>>> >>>> I use vp_vdpa device to simulate vdpa device, and create 4094 VLANS in >>>> guest to build a test environment for sending multiple CVQ state load >>>> commands. This patch series can improve latency from 10023 us to >>>> 8697 us for about 4099 CVQ state load commands, about 0.32 us per command. >>> >>> Looks like a tiny improvement. >>> At the same time we have O(n^2) behaviour with memory mappings. >> >> Hi Michael, >> >> Thanks for your review. >> >> I wonder why you say "we have O(n^2) behaviour on memory mappings" here? > > it's not specific to virtio - it's related to device init. > generally each device has some memory. during boot bios > enables each individually O(n) where n is # of devices. > memory maps has to be updated and in qemu this update > is at least superlinear with n (more like O(n log n) I think). > This gets up > O(n^2) with n number of devices. Thanks for your explanation. > >> From my understanding, QEMU maps two page-size buffers as control >> commands shadow buffers at device startup. These buffers then are used >> to cache SVQ control commands, where QEMU fills them with multiple SVQ >> control >> commands bytes, flushes them when SVQ descriptors are full or these >> control commands shadow buffers reach their capacity. >> >> QEMU repeats this process until all CVQ state load commands have been >> sent in loading. >> >> In this loading process, only control commands shadow buffers >> translation should be relative to memory mappings, which should be >> O(log n) behaviour to my understanding(Please correct me if I am wrong). >> >>> Not saying we must not do this but I think it's worth >>> checking where the bottleneck is. My guess would be >>> vp_vdpa is not doing things in parallel. Want to try fixing that >> >> As for "vp_vdpa is not doing things in parallel.", do you mean >> the vp_vdpa device cannot process QEMU's SVQ control commands >> in parallel? >> >> In this situation, I will try to use real vdpa hardware to >> test the patch series performance. > > yea, pls do that. > >>> to see how far it can be pushed? >> >> Currently, I am involved in the "Add virtio-net Control Virtqueue state >> restore support" project in Google Summer of Code now. Because I am >> uncertain about the time it will take to fix that problem in the vp_vdpa >> device, I prefer to complete the gsoc project first. >> >> Thanks! >> >> >>> >>> >>>> Note that this patch should be based on >>>> patch "Vhost-vdpa Shadow Virtqueue VLAN support" at [1]. >>>> >>>> [1]. https://lists.gnu.org/archive/html/qemu-devel/2023-07/msg03719.html >>>> >>>> TestStep >>>> >>>> 1. regression testing using vp-vdpa device >>>> - For L0 guest, boot QEMU with two virtio-net-pci net device with >>>> `ctrl_vq`, `ctrl_rx`, `ctrl_rx_extra` features on, command line like: >>>> -device virtio-net-pci,disable-legacy=on,disable-modern=off, >>>> iommu_platform=on,mq=on,ctrl_vq=on,guest_announce=off, >>>> indirect_desc=off,queue_reset=off,ctrl_rx=on,ctrl_rx_extra=on,... >>>> >>>> - For L1 guest, apply the patch series and compile the source code, >>>> start QEMU with two vdpa device with svq mode on, enable the `ctrl_vq`, >>>> `ctrl_rx`, `ctrl_rx_extra` features on, command line like: >>>> -netdev type=vhost-vdpa,x-svq=true,... >>>> -device virtio-net-pci,mq=on,guest_announce=off,ctrl_vq=on, >>>> ctrl_rx=on,ctrl_rx_extra=on... >>>> >>>> - For L2 source guest, run the following bash command: >>>> ```bash >>>> #!/bin/sh >>>> >>>> for idx1 in {0..9} >>>> do >>
Re: [PATCH v3 0/8] vdpa: Send all CVQ state load commands in parallel
在 2023/7/19 20:44, Lei Yang 写道: > Hello Hawkins and Michael > > Looks like there are big changes about vp_vdpa, therefore, if needed, > QE can test this series in QE's environment before the patch is Hi Lei, This patch series does not modify the code of vp_vdpa. Instead, it only modifies how QEMU sends SVQ control commands to the vdpa device. Considering that the behavior of the vp_vdpa device differs from that of real vdpa hardware, would it be possible for you to test this patch series on a real vdpa device? Thanks! > merged, and provide the result. > > BR > Lei > > > On Wed, Jul 19, 2023 at 8:37 PM Hawkins Jiawei wrote: >> >> 在 2023/7/19 17:11, Michael S. Tsirkin 写道: >>> On Wed, Jul 19, 2023 at 03:53:45PM +0800, Hawkins Jiawei wrote: >>>> This patchset allows QEMU to delay polling and checking the device >>>> used buffer until either the SVQ is full or control commands shadow >>>> buffers are full, instead of polling and checking immediately after >>>> sending each SVQ control command, so that QEMU can send all the SVQ >>>> control commands in parallel, which have better performance improvement. >>>> >>>> I use vp_vdpa device to simulate vdpa device, and create 4094 VLANS in >>>> guest to build a test environment for sending multiple CVQ state load >>>> commands. This patch series can improve latency from 10023 us to >>>> 8697 us for about 4099 CVQ state load commands, about 0.32 us per command. >>> >>> Looks like a tiny improvement. >>> At the same time we have O(n^2) behaviour with memory mappings. >> >> Hi Michael, >> >> Thanks for your review. >> >> I wonder why you say "we have O(n^2) behaviour on memory mappings" here? >> >> From my understanding, QEMU maps two page-size buffers as control >> commands shadow buffers at device startup. These buffers then are used >> to cache SVQ control commands, where QEMU fills them with multiple SVQ >> control >> commands bytes, flushes them when SVQ descriptors are full or these >> control commands shadow buffers reach their capacity. >> >> QEMU repeats this process until all CVQ state load commands have been >> sent in loading. >> >> In this loading process, only control commands shadow buffers >> translation should be relative to memory mappings, which should be >> O(log n) behaviour to my understanding(Please correct me if I am wrong). >> >>> Not saying we must not do this but I think it's worth >>> checking where the bottleneck is. My guess would be >>> vp_vdpa is not doing things in parallel. Want to try fixing that >> >> As for "vp_vdpa is not doing things in parallel.", do you mean >> the vp_vdpa device cannot process QEMU's SVQ control commands >> in parallel? >> >> In this situation, I will try to use real vdpa hardware to >> test the patch series performance. >> >>> to see how far it can be pushed? >> >> Currently, I am involved in the "Add virtio-net Control Virtqueue state >> restore support" project in Google Summer of Code now. Because I am >> uncertain about the time it will take to fix that problem in the vp_vdpa >> device, I prefer to complete the gsoc project first. >> >> Thanks! >> >> >>> >>> >>>> Note that this patch should be based on >>>> patch "Vhost-vdpa Shadow Virtqueue VLAN support" at [1]. >>>> >>>> [1]. https://lists.gnu.org/archive/html/qemu-devel/2023-07/msg03719.html >>>> >>>> TestStep >>>> >>>> 1. regression testing using vp-vdpa device >>>> - For L0 guest, boot QEMU with two virtio-net-pci net device with >>>> `ctrl_vq`, `ctrl_rx`, `ctrl_rx_extra` features on, command line like: >>>> -device virtio-net-pci,disable-legacy=on,disable-modern=off, >>>> iommu_platform=on,mq=on,ctrl_vq=on,guest_announce=off, >>>> indirect_desc=off,queue_reset=off,ctrl_rx=on,ctrl_rx_extra=on,... >>>> >>>> - For L1 guest, apply the patch series and compile the source code, >>>> start QEMU with two vdpa device with svq mode on, enable the `ctrl_vq`, >>>> `ctrl_rx`, `ctrl_rx_extra` features on, command line like: >>>> -netdev type=vhost-vdpa,x-svq=true,... >>>> -device virtio-net-pci,mq=on,guest_announce=off,ctrl_vq=on, >>>> ctrl_rx=on,ctrl_rx_extra=on... >>>> >>>> - For L2 source guest, run the following ba
Re: [PATCH v3 0/8] vdpa: Send all CVQ state load commands in parallel
在 2023/7/20 16:53, Lei Yang 写道: > According to the Hawkins provided steps, I tested two cases based on > applied this series patches and without it. And all tests are based on > the real hardware. > Case 1, without this series > Source: qemu-system-x86_64: vhost_vdpa_net_load() = 23308 us > Dest: qemu-system-x86_64: vhost_vdpa_net_load() = 23296 us > > Case 2, applied this series > Source: qemu-system-x86_64: vhost_vdpa_net_load() = 6558 us > Dest: qemu-system-x86_64: vhost_vdpa_net_load() = 6539 us > > Tested-by: Lei Yang > > > On Thu, Jul 20, 2023 at 6:54 AM Lei Yang wrote: >> >> On Wed, Jul 19, 2023 at 11:25 PM Hawkins Jiawei wrote: >>> >>> 在 2023/7/19 20:44, Lei Yang 写道: >>>> Hello Hawkins and Michael >>>> >>>> Looks like there are big changes about vp_vdpa, therefore, if needed, >>>> QE can test this series in QE's environment before the patch is >>> >>> Hi Lei, >>> >>> This patch series does not modify the code of vp_vdpa. Instead, it only >>> modifies how QEMU sends SVQ control commands to the vdpa device. >>> >> Hi Hawkins >> >>> Considering that the behavior of the vp_vdpa device differs from that >>> of real vdpa hardware, would it be possible for you to test this patch >>> series on a real vdpa device? >> >> Yes, there is a hardware device to test it , I will update the test >> results ASAP. >> >> BR >> Lei >>> >>> Thanks! >>> >>> >>>> merged, and provide the result. >>>> >>>> BR >>>> Lei >>>> >>>> >>>> On Wed, Jul 19, 2023 at 8:37 PM Hawkins Jiawei wrote: >>>>> >>>>> 在 2023/7/19 17:11, Michael S. Tsirkin 写道: >>>>>> On Wed, Jul 19, 2023 at 03:53:45PM +0800, Hawkins Jiawei wrote: >>>>>>> This patchset allows QEMU to delay polling and checking the device >>>>>>> used buffer until either the SVQ is full or control commands shadow >>>>>>> buffers are full, instead of polling and checking immediately after >>>>>>> sending each SVQ control command, so that QEMU can send all the SVQ >>>>>>> control commands in parallel, which have better performance improvement. >>>>>>> >>>>>>> I use vp_vdpa device to simulate vdpa device, and create 4094 VLANS in >>>>>>> guest to build a test environment for sending multiple CVQ state load >>>>>>> commands. This patch series can improve latency from 10023 us to >>>>>>> 8697 us for about 4099 CVQ state load commands, about 0.32 us per >>>>>>> command. It appears that the performance timing with the vp_vdpa device in this method is not consistently stable. I retest this patch series with the same steps with the vp_vdpa device. With this patch series, in the majority of cases, the time for CVQ state load commands is around 8000 us ~ 1 us. Without this patch series, in the majority of cases, the time for CVQ state load commands is around 14000 us ~ 2 us. Thanks! >>>>>> >>>>>> Looks like a tiny improvement. >>>>>> At the same time we have O(n^2) behaviour with memory mappings. >>>>> >>>>> Hi Michael, >>>>> >>>>> Thanks for your review. >>>>> >>>>> I wonder why you say "we have O(n^2) behaviour on memory mappings" here? >>>>> >>>>>From my understanding, QEMU maps two page-size buffers as control >>>>> commands shadow buffers at device startup. These buffers then are used >>>>> to cache SVQ control commands, where QEMU fills them with multiple SVQ >>>>> control >>>>> commands bytes, flushes them when SVQ descriptors are full or these >>>>> control commands shadow buffers reach their capacity. >>>>> >>>>> QEMU repeats this process until all CVQ state load commands have been >>>>> sent in loading. >>>>> >>>>> In this loading process, only control commands shadow buffers >>>>> translation should be relative to memory mappings, which should be >>>>> O(log n) behaviour to my understanding(Please correct me if I am wrong). >>>>> >>>>>> Not saying we must not do this but I think it's worth >>>>>> checking where the bottleneck is. My guess would be >>>>>> vp_vdpa is not doing th
Re: [PATCH 3/4] vdpa: Restore vlan filtering state
On 2023/7/21 19:57, Eugenio Perez Martin wrote: > On Wed, Jul 19, 2023 at 9:48 AM Hawkins Jiawei wrote: >> >> This patch introduces vhost_vdpa_net_load_single_vlan() >> and vhost_vdpa_net_load_vlan() to restore the vlan >> filtering state at device's startup. >> >> Co-developed-by: Eugenio Pérez >> Signed-off-by: Eugenio Pérez >> Signed-off-by: Hawkins Jiawei >> --- >> net/vhost-vdpa.c | 49 >> 1 file changed, 49 insertions(+) >> >> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c >> index 9795306742..0787dd933b 100644 >> --- a/net/vhost-vdpa.c >> +++ b/net/vhost-vdpa.c >> @@ -965,6 +965,51 @@ static int vhost_vdpa_net_load_rx(VhostVDPAState *s, >> return 0; >> } >> >> +static int vhost_vdpa_net_load_single_vlan(VhostVDPAState *s, >> + const VirtIONet *n, >> + uint16_t vid) >> +{ >> +const struct iovec data = { >> +.iov_base = &vid, >> +.iov_len = sizeof(vid), >> +}; >> +ssize_t dev_written = vhost_vdpa_net_load_cmd(s, VIRTIO_NET_CTRL_VLAN, >> + VIRTIO_NET_CTRL_VLAN_ADD, >> + &data, 1); >> +if (unlikely(dev_written < 0)) { >> +return dev_written; >> +} >> +if (unlikely(*s->status != VIRTIO_NET_OK)) { >> +return -EIO; >> +} >> + >> +return 0; >> +} >> + >> +static int vhost_vdpa_net_load_vlan(VhostVDPAState *s, >> +const VirtIONet *n) >> +{ >> +int r; >> + >> +if (!virtio_vdev_has_feature(&n->parent_obj, VIRTIO_NET_F_CTRL_VLAN)) { >> +return 0; >> +} >> + >> +for (int i = 0; i < MAX_VLAN >> 5; i++) { >> +for (int j = 0; n->vlans[i] && j <= 0x1f; j++) { >> +if (n->vlans[i] & (1U << j)) { >> +r = vhost_vdpa_net_load_single_vlan(s, n, (i << 5) + j); >> +if (unlikely(r != 0)) { >> +return r; >> +} >> +} >> +} >> +} >> + >> +return 0; >> +} >> + >> + > > Nit: I'm not sure if it was here originally, but there is an extra newline > here. Hi Eugenio, It was not here originally, it was introduced mistakenly during the refactoring process. I will fix this in the v2 version. Thanks! > >> static int vhost_vdpa_net_load(NetClientState *nc) >> { >> VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc); >> @@ -995,6 +1040,10 @@ static int vhost_vdpa_net_load(NetClientState *nc) >> if (unlikely(r)) { >> return r; >> } >> +r = vhost_vdpa_net_load_vlan(s, n); >> +if (unlikely(r)) { >> +return r; >> +} >> >> return 0; >> } >> -- >> 2.25.1 >> >
[PATCH v2 0/4] Vhost-vdpa Shadow Virtqueue VLAN support
This series enables shadowed CVQ to intercept VLAN commands through shadowed CVQ, update the virtio NIC device model so qemu send it in a migration, and the restore of that VLAN state in the destination. ChangeLog = v2: - remove the extra line pointed out by Eugenio in patch 3 "vdpa: Restore vlan filtering state" v1: https://lore.kernel.org/all/cover.1689690854.git.yin31...@gmail.com/ - based on patch "[PATCH 0/3] Vhost-vdpa Shadow Virtqueue VLAN support" at https://lists.gnu.org/archive/html/qemu-devel/2022-09/msg01016.html - move `MAX_VLAN` macro to include/hw/virtio/virtio-net.h instead of net/vhost-vdpa.c - fix conflicts with the master branch TestStep 1. test the migration using vp-vdpa device - For L0 guest, boot QEMU with two virtio-net-pci net device with `ctrl_vq`, `ctrl_vlan` features on, command line like: -device virtio-net-pci,disable-legacy=on,disable-modern=off, iommu_platform=on,mq=on,ctrl_vq=on,guest_announce=off, indirect_desc=off,queue_reset=off,ctrl_vlan=on,... - For L1 guest, apply the patch series and compile the source code, start QEMU with two vdpa device with svq mode on, enable the `ctrl_vq`, `ctrl_vlan` features on, command line like: -netdev type=vhost-vdpa,x-svq=true,... -device virtio-net-pci,mq=on,guest_announce=off,ctrl_vq=on, ctrl_vlan=on,... - For L2 source guest, run the following bash command: ```bash #!/bin/sh for idx in {1..4094} do ip link add link eth0 name vlan$idx type vlan id $idx done ``` - gdb attaches the L2 dest VM and break at the vhost_vdpa_net_load_single_vlan(), and execute the following gdbscript ```gdbscript ignore 1 4094 c ``` - Execute the live migration in L2 source monitor - Result * with this series, gdb can hit the breakpoint and continue the executing without triggering any error or warning. Eugenio Pérez (1): virtio-net: do not reset vlan filtering at set_features Hawkins Jiawei (3): virtio-net: Expose MAX_VLAN vdpa: Restore vlan filtering state vdpa: Allow VIRTIO_NET_F_CTRL_VLAN in SVQ hw/net/virtio-net.c| 6 + include/hw/virtio/virtio-net.h | 6 + net/vhost-vdpa.c | 49 ++ 3 files changed, 56 insertions(+), 5 deletions(-) -- 2.25.1
[PATCH v2 2/4] virtio-net: Expose MAX_VLAN
vhost-vdpa shadowed CVQ needs to know the maximum number of vlans supported by the virtio-net device, so QEMU can restore the VLAN state in a migration. Co-developed-by: Eugenio Pérez Signed-off-by: Eugenio Pérez Signed-off-by: Hawkins Jiawei --- hw/net/virtio-net.c| 2 -- include/hw/virtio/virtio-net.h | 6 ++ 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index d20d5a63cd..a32672039d 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -49,8 +49,6 @@ #define VIRTIO_NET_VM_VERSION11 -#define MAX_VLAN(1 << 12) /* Per 802.1Q definition */ - /* previously fixed value */ #define VIRTIO_NET_RX_QUEUE_DEFAULT_SIZE 256 #define VIRTIO_NET_TX_QUEUE_DEFAULT_SIZE 256 diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h index 5f5dcb4572..93f3bb5d97 100644 --- a/include/hw/virtio/virtio-net.h +++ b/include/hw/virtio/virtio-net.h @@ -38,6 +38,12 @@ OBJECT_DECLARE_SIMPLE_TYPE(VirtIONet, VIRTIO_NET) /* Maximum VIRTIO_NET_CTRL_MAC_TABLE_SET unicast + multicast entries. */ #define MAC_TABLE_ENTRIES64 +/* + * The maximum number of VLANs in the VLAN filter table + * added by VIRTIO_NET_CTRL_VLAN_ADD + */ +#define MAX_VLAN(1 << 12) /* Per 802.1Q definition */ + typedef struct virtio_net_conf { uint32_t txtimer; -- 2.25.1
[PATCH v2 1/4] virtio-net: do not reset vlan filtering at set_features
From: Eugenio Pérez This function is called after virtio_load, so all vlan configuration is lost in migration case. Just allow all the vlan-tagged packets if vlan is not configured, and trust device reset to clear all filtered vlans. Fixes: 0b1eaa8803 ("virtio-net: Do not filter VLANs without F_CTRL_VLAN") Signed-off-by: Eugenio Pérez Reviewed-by: Hawkins Jiawei Signed-off-by: Hawkins Jiawei --- hw/net/virtio-net.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index 7102ec4817..d20d5a63cd 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -1006,9 +1006,7 @@ static void virtio_net_set_features(VirtIODevice *vdev, uint64_t features) vhost_net_save_acked_features(nc->peer); } -if (virtio_has_feature(features, VIRTIO_NET_F_CTRL_VLAN)) { -memset(n->vlans, 0, MAX_VLAN >> 3); -} else { +if (!virtio_has_feature(features, VIRTIO_NET_F_CTRL_VLAN)) { memset(n->vlans, 0xff, MAX_VLAN >> 3); } -- 2.25.1
[PATCH v2 4/4] vdpa: Allow VIRTIO_NET_F_CTRL_VLAN in SVQ
Enable SVQ with VIRTIO_NET_F_CTRL_VLAN feature. Co-developed-by: Eugenio Pérez Signed-off-by: Eugenio Pérez Signed-off-by: Hawkins Jiawei --- net/vhost-vdpa.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c index 347241796d..73e9063fa0 100644 --- a/net/vhost-vdpa.c +++ b/net/vhost-vdpa.c @@ -111,6 +111,7 @@ static const uint64_t vdpa_svq_device_features = BIT_ULL(VIRTIO_NET_F_STATUS) | BIT_ULL(VIRTIO_NET_F_CTRL_VQ) | BIT_ULL(VIRTIO_NET_F_CTRL_RX) | +BIT_ULL(VIRTIO_NET_F_CTRL_VLAN) | BIT_ULL(VIRTIO_NET_F_CTRL_RX_EXTRA) | BIT_ULL(VIRTIO_NET_F_MQ) | BIT_ULL(VIRTIO_F_ANY_LAYOUT) | -- 2.25.1
[PATCH v2 3/4] vdpa: Restore vlan filtering state
This patch introduces vhost_vdpa_net_load_single_vlan() and vhost_vdpa_net_load_vlan() to restore the vlan filtering state at device's startup. Co-developed-by: Eugenio Pérez Signed-off-by: Eugenio Pérez Signed-off-by: Hawkins Jiawei --- v2: - remove the extra line pointed out by Eugenio v1: https://lore.kernel.org/all/0a568cc8a8d2b750c2e09b2237e9f05cece07c3f.1689690854.git.yin31...@gmail.com/ net/vhost-vdpa.c | 48 1 file changed, 48 insertions(+) diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c index 9795306742..347241796d 100644 --- a/net/vhost-vdpa.c +++ b/net/vhost-vdpa.c @@ -965,6 +965,50 @@ static int vhost_vdpa_net_load_rx(VhostVDPAState *s, return 0; } +static int vhost_vdpa_net_load_single_vlan(VhostVDPAState *s, + const VirtIONet *n, + uint16_t vid) +{ +const struct iovec data = { +.iov_base = &vid, +.iov_len = sizeof(vid), +}; +ssize_t dev_written = vhost_vdpa_net_load_cmd(s, VIRTIO_NET_CTRL_VLAN, + VIRTIO_NET_CTRL_VLAN_ADD, + &data, 1); +if (unlikely(dev_written < 0)) { +return dev_written; +} +if (unlikely(*s->status != VIRTIO_NET_OK)) { +return -EIO; +} + +return 0; +} + +static int vhost_vdpa_net_load_vlan(VhostVDPAState *s, +const VirtIONet *n) +{ +int r; + +if (!virtio_vdev_has_feature(&n->parent_obj, VIRTIO_NET_F_CTRL_VLAN)) { +return 0; +} + +for (int i = 0; i < MAX_VLAN >> 5; i++) { +for (int j = 0; n->vlans[i] && j <= 0x1f; j++) { +if (n->vlans[i] & (1U << j)) { +r = vhost_vdpa_net_load_single_vlan(s, n, (i << 5) + j); +if (unlikely(r != 0)) { +return r; +} +} +} +} + +return 0; +} + static int vhost_vdpa_net_load(NetClientState *nc) { VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc); @@ -995,6 +1039,10 @@ static int vhost_vdpa_net_load(NetClientState *nc) if (unlikely(r)) { return r; } +r = vhost_vdpa_net_load_vlan(s, n); +if (unlikely(r)) { +return r; +} return 0; } -- 2.25.1
Re: [PATCH v2 1/4] virtio-net: do not reset vlan filtering at set_features
On 2023/7/23 17:26, Hawkins Jiawei wrote: > From: Eugenio Pérez There was a wrong "From" line by mistake, I will send the v3 patch to fix this. Thanks! > > This function is called after virtio_load, so all vlan configuration is > lost in migration case. > > Just allow all the vlan-tagged packets if vlan is not configured, and > trust device reset to clear all filtered vlans. > > Fixes: 0b1eaa8803 ("virtio-net: Do not filter VLANs without F_CTRL_VLAN") > Signed-off-by: Eugenio Pérez > Reviewed-by: Hawkins Jiawei > Signed-off-by: Hawkins Jiawei > --- > hw/net/virtio-net.c | 4 +--- > 1 file changed, 1 insertion(+), 3 deletions(-) > > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c > index 7102ec4817..d20d5a63cd 100644 > --- a/hw/net/virtio-net.c > +++ b/hw/net/virtio-net.c > @@ -1006,9 +1006,7 @@ static void virtio_net_set_features(VirtIODevice *vdev, > uint64_t features) > vhost_net_save_acked_features(nc->peer); > } > > -if (virtio_has_feature(features, VIRTIO_NET_F_CTRL_VLAN)) { > -memset(n->vlans, 0, MAX_VLAN >> 3); > -} else { > +if (!virtio_has_feature(features, VIRTIO_NET_F_CTRL_VLAN)) { > memset(n->vlans, 0xff, MAX_VLAN >> 3); > } >
[PATCH v3 0/4] Vhost-vdpa Shadow Virtqueue VLAN support
This series enables shadowed CVQ to intercept VLAN commands through shadowed CVQ, update the virtio NIC device model so qemu send it in a migration, and the restore of that VLAN state in the destination. ChangeLog = v3: - remove the extra "From" line in patch 1 "virtio-net: do not reset vlan filtering at set_features" v2: https://lore.kernel.org/all/cover.1690100802.git.yin31...@gmail.com/ - remove the extra line pointed out by Eugenio in patch 3 "vdpa: Restore vlan filtering state" v1: https://lore.kernel.org/all/cover.1689690854.git.yin31...@gmail.com/ - based on patch "[PATCH 0/3] Vhost-vdpa Shadow Virtqueue VLAN support" at https://lists.gnu.org/archive/html/qemu-devel/2022-09/msg01016.html - move `MAX_VLAN` macro to include/hw/virtio/virtio-net.h instead of net/vhost-vdpa.c - fix conflicts with the master branch TestStep 1. test the migration using vp-vdpa device - For L0 guest, boot QEMU with two virtio-net-pci net device with `ctrl_vq`, `ctrl_vlan` features on, command line like: -device virtio-net-pci,disable-legacy=on,disable-modern=off, iommu_platform=on,mq=on,ctrl_vq=on,guest_announce=off, indirect_desc=off,queue_reset=off,ctrl_vlan=on,... - For L1 guest, apply the patch series and compile the source code, start QEMU with two vdpa device with svq mode on, enable the `ctrl_vq`, `ctrl_vlan` features on, command line like: -netdev type=vhost-vdpa,x-svq=true,... -device virtio-net-pci,mq=on,guest_announce=off,ctrl_vq=on, ctrl_vlan=on,... - For L2 source guest, run the following bash command: ```bash #!/bin/sh for idx in {1..4094} do ip link add link eth0 name vlan$idx type vlan id $idx done ``` - gdb attaches the L2 dest VM and break at the vhost_vdpa_net_load_single_vlan(), and execute the following gdbscript ```gdbscript ignore 1 4094 c ``` - Execute the live migration in L2 source monitor - Result * with this series, gdb can hit the breakpoint and continue the executing without triggering any error or warning. Eugenio Pérez (1): virtio-net: do not reset vlan filtering at set_features Hawkins Jiawei (3): virtio-net: Expose MAX_VLAN vdpa: Restore vlan filtering state vdpa: Allow VIRTIO_NET_F_CTRL_VLAN in SVQ hw/net/virtio-net.c| 6 + include/hw/virtio/virtio-net.h | 6 + net/vhost-vdpa.c | 49 ++ 3 files changed, 56 insertions(+), 5 deletions(-) -- 2.25.1
[PATCH v3 1/4] virtio-net: do not reset vlan filtering at set_features
This function is called after virtio_load, so all vlan configuration is lost in migration case. Just allow all the vlan-tagged packets if vlan is not configured, and trust device reset to clear all filtered vlans. Fixes: 0b1eaa8803 ("virtio-net: Do not filter VLANs without F_CTRL_VLAN") Signed-off-by: Eugenio Pérez Reviewed-by: Hawkins Jiawei Signed-off-by: Hawkins Jiawei --- v3: - remove the extra "From" line v2: https://lore.kernel.org/all/95af0d013281282f48ad3f47f6ad1ac4ca9e52eb.1690100802.git.yin31...@gmail.com/ hw/net/virtio-net.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index 7102ec4817..d20d5a63cd 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -1006,9 +1006,7 @@ static void virtio_net_set_features(VirtIODevice *vdev, uint64_t features) vhost_net_save_acked_features(nc->peer); } -if (virtio_has_feature(features, VIRTIO_NET_F_CTRL_VLAN)) { -memset(n->vlans, 0, MAX_VLAN >> 3); -} else { +if (!virtio_has_feature(features, VIRTIO_NET_F_CTRL_VLAN)) { memset(n->vlans, 0xff, MAX_VLAN >> 3); } -- 2.25.1
[PATCH v3 4/4] vdpa: Allow VIRTIO_NET_F_CTRL_VLAN in SVQ
Enable SVQ with VIRTIO_NET_F_CTRL_VLAN feature. Co-developed-by: Eugenio Pérez Signed-off-by: Eugenio Pérez Signed-off-by: Hawkins Jiawei --- net/vhost-vdpa.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c index 347241796d..73e9063fa0 100644 --- a/net/vhost-vdpa.c +++ b/net/vhost-vdpa.c @@ -111,6 +111,7 @@ static const uint64_t vdpa_svq_device_features = BIT_ULL(VIRTIO_NET_F_STATUS) | BIT_ULL(VIRTIO_NET_F_CTRL_VQ) | BIT_ULL(VIRTIO_NET_F_CTRL_RX) | +BIT_ULL(VIRTIO_NET_F_CTRL_VLAN) | BIT_ULL(VIRTIO_NET_F_CTRL_RX_EXTRA) | BIT_ULL(VIRTIO_NET_F_MQ) | BIT_ULL(VIRTIO_F_ANY_LAYOUT) | -- 2.25.1
[PATCH v3 3/4] vdpa: Restore vlan filtering state
This patch introduces vhost_vdpa_net_load_single_vlan() and vhost_vdpa_net_load_vlan() to restore the vlan filtering state at device's startup. Co-developed-by: Eugenio Pérez Signed-off-by: Eugenio Pérez Signed-off-by: Hawkins Jiawei --- v2: - remove the extra line pointed out by Eugenio v1: https://lore.kernel.org/all/0a568cc8a8d2b750c2e09b2237e9f05cece07c3f.1689690854.git.yin31...@gmail.com/ net/vhost-vdpa.c | 48 1 file changed, 48 insertions(+) diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c index 9795306742..347241796d 100644 --- a/net/vhost-vdpa.c +++ b/net/vhost-vdpa.c @@ -965,6 +965,50 @@ static int vhost_vdpa_net_load_rx(VhostVDPAState *s, return 0; } +static int vhost_vdpa_net_load_single_vlan(VhostVDPAState *s, + const VirtIONet *n, + uint16_t vid) +{ +const struct iovec data = { +.iov_base = &vid, +.iov_len = sizeof(vid), +}; +ssize_t dev_written = vhost_vdpa_net_load_cmd(s, VIRTIO_NET_CTRL_VLAN, + VIRTIO_NET_CTRL_VLAN_ADD, + &data, 1); +if (unlikely(dev_written < 0)) { +return dev_written; +} +if (unlikely(*s->status != VIRTIO_NET_OK)) { +return -EIO; +} + +return 0; +} + +static int vhost_vdpa_net_load_vlan(VhostVDPAState *s, +const VirtIONet *n) +{ +int r; + +if (!virtio_vdev_has_feature(&n->parent_obj, VIRTIO_NET_F_CTRL_VLAN)) { +return 0; +} + +for (int i = 0; i < MAX_VLAN >> 5; i++) { +for (int j = 0; n->vlans[i] && j <= 0x1f; j++) { +if (n->vlans[i] & (1U << j)) { +r = vhost_vdpa_net_load_single_vlan(s, n, (i << 5) + j); +if (unlikely(r != 0)) { +return r; +} +} +} +} + +return 0; +} + static int vhost_vdpa_net_load(NetClientState *nc) { VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc); @@ -995,6 +1039,10 @@ static int vhost_vdpa_net_load(NetClientState *nc) if (unlikely(r)) { return r; } +r = vhost_vdpa_net_load_vlan(s, n); +if (unlikely(r)) { +return r; +} return 0; } -- 2.25.1
[PATCH v3 2/4] virtio-net: Expose MAX_VLAN
vhost-vdpa shadowed CVQ needs to know the maximum number of vlans supported by the virtio-net device, so QEMU can restore the VLAN state in a migration. Co-developed-by: Eugenio Pérez Signed-off-by: Eugenio Pérez Signed-off-by: Hawkins Jiawei --- hw/net/virtio-net.c| 2 -- include/hw/virtio/virtio-net.h | 6 ++ 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index d20d5a63cd..a32672039d 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -49,8 +49,6 @@ #define VIRTIO_NET_VM_VERSION11 -#define MAX_VLAN(1 << 12) /* Per 802.1Q definition */ - /* previously fixed value */ #define VIRTIO_NET_RX_QUEUE_DEFAULT_SIZE 256 #define VIRTIO_NET_TX_QUEUE_DEFAULT_SIZE 256 diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h index 5f5dcb4572..93f3bb5d97 100644 --- a/include/hw/virtio/virtio-net.h +++ b/include/hw/virtio/virtio-net.h @@ -38,6 +38,12 @@ OBJECT_DECLARE_SIMPLE_TYPE(VirtIONet, VIRTIO_NET) /* Maximum VIRTIO_NET_CTRL_MAC_TABLE_SET unicast + multicast entries. */ #define MAC_TABLE_ENTRIES64 +/* + * The maximum number of VLANs in the VLAN filter table + * added by VIRTIO_NET_CTRL_VLAN_ADD + */ +#define MAX_VLAN(1 << 12) /* Per 802.1Q definition */ + typedef struct virtio_net_conf { uint32_t txtimer; -- 2.25.1
Re: [PATCH v2 3/4] vdpa: Restore vlan filtering state
On 2023/7/25 14:47, Jason Wang wrote: > On Sun, Jul 23, 2023 at 5:28 PM Hawkins Jiawei wrote: >> >> This patch introduces vhost_vdpa_net_load_single_vlan() >> and vhost_vdpa_net_load_vlan() to restore the vlan >> filtering state at device's startup. >> >> Co-developed-by: Eugenio Pérez >> Signed-off-by: Eugenio Pérez >> Signed-off-by: Hawkins Jiawei > > Acked-by: Jason Wang > > But this seems to be a source of latency killer as it may at most send > 1024 commands. > > As discussed in the past, we need a better cvq command to do this: for > example, a single command to carray a bitmap. Hi Jason, Thanks for your review. You are right, we need some improvement here. Therefore, I have submitted another patch series titled "vdpa: Send all CVQ state load commands in parallel" at [1], which allows QEMU to delay polling and checking the device used buffer until either the SVQ is full or control commands shadow buffers are full, so that QEMU can send all the SVQ control commands in parallel, which has better performance improvement. To test that patch series, I created 4094 VLANS in guest to build an environment for sending multiple CVQ state load commands. According to the result on the real vdpa device at [2], this patch series can improve latency from 23296 us to 6539 us. Thanks! [1]. https://lists.gnu.org/archive/html/qemu-devel/2023-07/msg03726.html [2]. https://lists.gnu.org/archive/html/qemu-devel/2023-07/msg03947.html > > Thanks > >> --- >> v2: >> - remove the extra line pointed out by Eugenio >> >> v1: >> https://lore.kernel.org/all/0a568cc8a8d2b750c2e09b2237e9f05cece07c3f.1689690854.git.yin31...@gmail.com/ >> >> net/vhost-vdpa.c | 48 >> 1 file changed, 48 insertions(+) >> >> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c >> index 9795306742..347241796d 100644 >> --- a/net/vhost-vdpa.c >> +++ b/net/vhost-vdpa.c >> @@ -965,6 +965,50 @@ static int vhost_vdpa_net_load_rx(VhostVDPAState *s, >> return 0; >> } >> >> +static int vhost_vdpa_net_load_single_vlan(VhostVDPAState *s, >> + const VirtIONet *n, >> + uint16_t vid) >> +{ >> +const struct iovec data = { >> +.iov_base = &vid, >> +.iov_len = sizeof(vid), >> +}; >> +ssize_t dev_written = vhost_vdpa_net_load_cmd(s, VIRTIO_NET_CTRL_VLAN, >> + VIRTIO_NET_CTRL_VLAN_ADD, >> + &data, 1); >> +if (unlikely(dev_written < 0)) { >> +return dev_written; >> +} >> +if (unlikely(*s->status != VIRTIO_NET_OK)) { >> +return -EIO; >> +} >> + >> +return 0; >> +} >> + >> +static int vhost_vdpa_net_load_vlan(VhostVDPAState *s, >> +const VirtIONet *n) >> +{ >> +int r; >> + >> +if (!virtio_vdev_has_feature(&n->parent_obj, VIRTIO_NET_F_CTRL_VLAN)) { >> +return 0; >> +} >> + >> +for (int i = 0; i < MAX_VLAN >> 5; i++) { >> +for (int j = 0; n->vlans[i] && j <= 0x1f; j++) { >> +if (n->vlans[i] & (1U << j)) { >> +r = vhost_vdpa_net_load_single_vlan(s, n, (i << 5) + j); >> +if (unlikely(r != 0)) { >> +return r; >> +} >> +} >> +} >> +} >> + >> +return 0; >> +} >> + >> static int vhost_vdpa_net_load(NetClientState *nc) >> { >> VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc); >> @@ -995,6 +1039,10 @@ static int vhost_vdpa_net_load(NetClientState *nc) >> if (unlikely(r)) { >> return r; >> } >> +r = vhost_vdpa_net_load_vlan(s, n); >> +if (unlikely(r)) { >> +return r; >> +} >> >> return 0; >> } >> -- >> 2.25.1 >> >
Re: [PATCH v3 0/3] vdpa: Return -EIO if device ack is VIRTIO_NET_ERR
On 2023/8/5 14:15, Michael Tokarev wrote: > 04.07.2023 06:34, Hawkins Jiawei wrote: >> According to VirtIO standard, "The class, command and >> command-specific-data are set by the driver, >> and the device sets the ack byte. >> There is little it can do except issue a diagnostic >> if ack is not VIRTIO_NET_OK." >> >> Therefore, QEMU should stop sending the queued SVQ commands and >> cancel the device startup if the device's ack is not VIRTIO_NET_OK. >> >> Yet the problem is that, vhost_vdpa_net_load_x() returns 1 based on >> `*s->status != VIRTIO_NET_OK` when the device's ack is VIRTIO_NET_ERR. >> As a result, net->nc->info->load() also returns 1, this makes >> vhost_net_start_one() incorrectly assume the device state is >> successfully loaded by vhost_vdpa_net_load() and return 0, instead of >> goto `fail` label to cancel the device startup, as vhost_net_start_one() >> only cancels the device startup when net->nc->info->load() returns a >> negative value. >> >> This patchset fixes this problem by returning -EIO when the device's >> ack is not VIRTIO_NET_OK. >> >> Changelog >> = >> v3: >> - split the fixes suggested by Eugenio >> - return -EIO suggested by Michael >> >> v2: >> https://lore.kernel.org/all/69010e9ebb5e3729aef595ed92840f43e48e53e5.1687875592.git.yin31...@gmail.com/ >> - fix the same bug in vhost_vdpa_net_load_offloads() >> >> v1: https://lore.kernel.org/all/cover.1686746406.git.yin31...@gmail.com/ >> >> Hawkins Jiawei (3): >>vdpa: Return -EIO if device ack is VIRTIO_NET_ERR in _load_mac() >>vdpa: Return -EIO if device ack is VIRTIO_NET_ERR in _load_mq() >>vdpa: Return -EIO if device ack is VIRTIO_NET_ERR in _load_offloads() > > Hi! > > I don't remember why, but this patch series is marked as "check later" in > my qemu-stable-to-apply email folder. Does it make sense to back-port this > series to stable-8.0? > > 6f34807116 vdpa: Return -EIO if device ack is VIRTIO_NET_ERR in > _load_offloads() > f45fd95ec9 vdpa: Return -EIO if device ack is VIRTIO_NET_ERR in _load_mq() > b479bc3c9d vdpa: Return -EIO if device ack is VIRTIO_NET_ERR in _load_mac() > Hi Michael, Yes, this bug exists in stable-8.0, so it makes sense to back-port this series. Commit f45fd95ec9("vdpa: Return -EIO if device ack is VIRTIO_NET_ERR in _load_mq()") and commit b479bc3c9d("vdpa: Return -EIO if device ack is VIRTIO_NET_ERR in _load_mac()") can be back-ported directly. > Patch 6f34807116 also needs > > b58d3686a0 vdpa: Add vhost_vdpa_net_load_offloads() As you point out, patch 6f34807116("vdpa: Return -EIO if device ack is VIRTIO_NET_ERR in _load_offloads()") is a fix to the commit b58d3686a0("vdpa: Add vhost_vdpa_net_load_offloads()"), which was introduced by patch series "Vhost-vdpa Shadow Virtqueue Offloads support" at [1]. This mentioned patch series introduces a new feature for QEMU and has not been merged into stable-8.0 yet, so I think we do not need to apply the patch 6f34807116("vdpa: Return -EIO if device ack is VIRTIO_NET_ERR in _load_offloads()") to stable-8.0. Sorry for not mentioning this information in the cover letter. Thanks! [1]. https://lore.kernel.org/all/cover.1685704856.git.yin31...@gmail.com/ > > for 8.0. > > Thanks, > > /mjt
[PATCH v2 0/2] Send all the SVQ control commands in parallel
This patchset allows QEMU to poll and check the device used buffer after sending all SVQ control commands, instead of polling and checking immediately after sending each SVQ control command, so that QEMU can send all the SVQ control commands in parallel, which have better performance improvement. I use vdpa_sim_net to simulate vdpa device, refactor vhost_vdpa_net_load() to call vhost_vdpa_net_load_mac() 30 times, to build a test environment for sending multiple SVQ control commands. The monotonic time to finish vhost_vdpa_net_load() is as follows: QEMUmicroseconds -- not patched 85.092 -- patched 79.222 So this is a save of (85.092 - 79.222)/30 = 0.2 ms per command. This patchset resolves the GitLab issue at https://gitlab.com/qemu-project/qemu/-/issues/1578. v2: - recover accidentally deleted rows - remove extra newline - refactor `need_poll_len` to `cmds_in_flight` - return -EINVAL when vhost_svq_poll() return 0 or check on buffers written by device fails - change the type of `in_cursor`, and refactor the code for updating cursor - return directly when vhost_vdpa_net_load_{mac,mq}() returns a failure in vhost_vdpa_net_load() v1: https://lists.nongnu.org/archive/html/qemu-devel/2023-04/msg02668.html Hawkins Jiawei (2): vdpa: rename vhost_vdpa_net_cvq_add() vdpa: send CVQ state load commands in parallel net/vhost-vdpa.c | 165 +-- 1 file changed, 130 insertions(+), 35 deletions(-) -- 2.25.1
[PATCH v2 2/2] vdpa: send CVQ state load commands in parallel
This patch introduces the vhost_vdpa_net_cvq_add() and refactors the vhost_vdpa_net_load*(), so that QEMU can send CVQ state load commands in parallel. To be more specific, this patch introduces vhost_vdpa_net_cvq_add() to add SVQ control commands to SVQ and kick the device, but does not poll the device used buffers. QEMU will not poll and check the device used buffers in vhost_vdpa_net_load() until all CVQ state load commands have been sent to the device. What's more, in order to avoid buffer overwriting caused by using `svq->cvq_cmd_out_buffer` and `svq->status` as the buffer for all CVQ state load commands when sending CVQ state load commands in parallel, this patch introduces `out_cursor` and `in_cursor` in vhost_vdpa_net_load(), pointing to the available buffer for in descriptor and out descriptor, so that different CVQ state load commands can use their unique buffer. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1578 Signed-off-by: Hawkins Jiawei --- net/vhost-vdpa.c | 152 +-- 1 file changed, 120 insertions(+), 32 deletions(-) diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c index 10804c7200..14e31ca5c5 100644 --- a/net/vhost-vdpa.c +++ b/net/vhost-vdpa.c @@ -590,6 +590,44 @@ static void vhost_vdpa_net_cvq_stop(NetClientState *nc) vhost_vdpa_net_client_stop(nc); } +/** + * vhost_vdpa_net_cvq_add() adds SVQ control commands to SVQ, + * kicks the device but does not poll the device used buffers. + * + * Return the number of elements added to SVQ if success. + */ +static int vhost_vdpa_net_cvq_add(VhostVDPAState *s, +void **out_cursor, size_t out_len, +virtio_net_ctrl_ack **in_cursor, size_t in_len) +{ +/* Buffers for the device */ +const struct iovec out = { +.iov_base = *out_cursor, +.iov_len = out_len, +}; +const struct iovec in = { +.iov_base = *in_cursor, +.iov_len = sizeof(virtio_net_ctrl_ack), +}; +VhostShadowVirtqueue *svq = g_ptr_array_index(s->vhost_vdpa.shadow_vqs, 0); +int r; + +r = vhost_svq_add(svq, &out, 1, &in, 1, NULL); +if (unlikely(r != 0)) { +if (unlikely(r == -ENOSPC)) { +qemu_log_mask(LOG_GUEST_ERROR, "%s: No space on device queue\n", + __func__); +} +return r; +} + +/* Update the cursor */ +*out_cursor += out_len; +*in_cursor += 1; + +return 1; +} + /** * vhost_vdpa_net_cvq_add_and_wait() adds SVQ control commands to SVQ, * kicks the device and polls the device used buffers. @@ -628,69 +666,82 @@ static ssize_t vhost_vdpa_net_cvq_add_and_wait(VhostVDPAState *s, return vhost_svq_poll(svq); } -static ssize_t vhost_vdpa_net_load_cmd(VhostVDPAState *s, uint8_t class, - uint8_t cmd, const void *data, - size_t data_size) + +/** + * vhost_vdpa_net_load_cmd() restores the NIC state through SVQ. + * + * Return the number of elements added to SVQ if success. + */ +static int vhost_vdpa_net_load_cmd(VhostVDPAState *s, +void **out_cursor, uint8_t class, uint8_t cmd, +const void *data, size_t data_size, +virtio_net_ctrl_ack **in_cursor) { const struct virtio_net_ctrl_hdr ctrl = { .class = class, .cmd = cmd, }; -assert(data_size < vhost_vdpa_net_cvq_cmd_page_len() - sizeof(ctrl)); +assert(sizeof(ctrl) < vhost_vdpa_net_cvq_cmd_page_len() - + (*out_cursor - s->cvq_cmd_out_buffer)); +assert(data_size < vhost_vdpa_net_cvq_cmd_page_len() - sizeof(ctrl) - + (*out_cursor - s->cvq_cmd_out_buffer)); -memcpy(s->cvq_cmd_out_buffer, &ctrl, sizeof(ctrl)); -memcpy(s->cvq_cmd_out_buffer + sizeof(ctrl), data, data_size); +memcpy(*out_cursor, &ctrl, sizeof(ctrl)); +memcpy(*out_cursor + sizeof(ctrl), data, data_size); -return vhost_vdpa_net_cvq_add_and_wait(s, sizeof(ctrl) + data_size, - sizeof(virtio_net_ctrl_ack)); +return vhost_vdpa_net_cvq_add(s, out_cursor, sizeof(ctrl) + data_size, + in_cursor, sizeof(virtio_net_ctrl_ack)); } -static int vhost_vdpa_net_load_mac(VhostVDPAState *s, const VirtIONet *n) +/** + * vhost_vdpa_net_load_mac() restores the NIC mac through SVQ. + * + * Return the number of elements added to SVQ if success. + */ +static int vhost_vdpa_net_load_mac(VhostVDPAState *s, const VirtIONet *n, +void **out_cursor, virtio_net_ctrl_ack **in_cursor) { uint64_t features = n->parent_obj.guest_features; if (features & BIT_ULL(VIRTIO_NET_F_CTRL_MAC_ADDR)) { -ssize_t dev_written = vhost_vdpa_net_load_cmd(s, VIRTIO_NET_CTRL_MAC, -
[PATCH v2 1/2] vdpa: rename vhost_vdpa_net_cvq_add()
We want to introduce a new version of vhost_vdpa_net_cvq_add() that does not poll immediately after forwarding custom buffers to the device, so that QEMU can send all the SVQ control commands in parallel instead of serialized. Signed-off-by: Hawkins Jiawei --- net/vhost-vdpa.c | 15 +++ 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c index 99904a0da7..10804c7200 100644 --- a/net/vhost-vdpa.c +++ b/net/vhost-vdpa.c @@ -590,8 +590,14 @@ static void vhost_vdpa_net_cvq_stop(NetClientState *nc) vhost_vdpa_net_client_stop(nc); } -static ssize_t vhost_vdpa_net_cvq_add(VhostVDPAState *s, size_t out_len, - size_t in_len) +/** + * vhost_vdpa_net_cvq_add_and_wait() adds SVQ control commands to SVQ, + * kicks the device and polls the device used buffers. + * + * Return the length written by the device. + */ +static ssize_t vhost_vdpa_net_cvq_add_and_wait(VhostVDPAState *s, +size_t out_len, size_t in_len) { /* Buffers for the device */ const struct iovec out = { @@ -636,7 +642,7 @@ static ssize_t vhost_vdpa_net_load_cmd(VhostVDPAState *s, uint8_t class, memcpy(s->cvq_cmd_out_buffer, &ctrl, sizeof(ctrl)); memcpy(s->cvq_cmd_out_buffer + sizeof(ctrl), data, data_size); -return vhost_vdpa_net_cvq_add(s, sizeof(ctrl) + data_size, +return vhost_vdpa_net_cvq_add_and_wait(s, sizeof(ctrl) + data_size, sizeof(virtio_net_ctrl_ack)); } @@ -753,7 +759,8 @@ static int vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq, dev_written = sizeof(status); *s->status = VIRTIO_NET_OK; } else { -dev_written = vhost_vdpa_net_cvq_add(s, out.iov_len, sizeof(status)); +dev_written = vhost_vdpa_net_cvq_add_and_wait(s, out.iov_len, + sizeof(status)); if (unlikely(dev_written < 0)) { goto out; } -- 2.25.1
[PATCH RESEND] vhost: fix possible wrap in SVQ descriptor ring
QEMU invokes vhost_svq_add() when adding a guest's element into SVQ. In vhost_svq_add(), it uses vhost_svq_available_slots() to check whether QEMU can add the element into the SVQ. If there is enough space, then QEMU combines some out descriptors and some in descriptors into one descriptor chain, and add it into svq->vring.desc by vhost_svq_vring_write_descs(). Yet the problem is that, `svq->shadow_avail_idx - svq->shadow_used_idx` in vhost_svq_available_slots() return the number of occupied elements, or the number of descriptor chains, instead of the number of occupied descriptors, which may cause wrapping in SVQ descriptor ring. Here is an example. In vhost_handle_guest_kick(), QEMU forwards as many available buffers to device by virtqueue_pop() and vhost_svq_add_element(). virtqueue_pop() return a guest's element, and use vhost_svq_add_elemnt(), a wrapper to vhost_svq_add(), to add this element into SVQ. If QEMU invokes virtqueue_pop() and vhost_svq_add_element() `svq->vring.num` times, vhost_svq_available_slots() thinks QEMU just ran out of slots and everything should work fine. But in fact, virtqueue_pop() return `svq-vring.num` elements or descriptor chains, more than `svq->vring.num` descriptors, due to guest memory fragmentation, and this cause wrapping in SVQ descriptor ring. Therefore, this patch adds `num_free` field in VhostShadowVirtqueue structure, updates this field in vhost_svq_add() and vhost_svq_get_buf(), to record the number of free descriptors. Then we can avoid wrap in SVQ descriptor ring by refactoring vhost_svq_available_slots(). Fixes: 100890f7ca ("vhost: Shadow virtqueue buffers forwarding") Signed-off-by: Hawkins Jiawei --- hw/virtio/vhost-shadow-virtqueue.c | 9 - hw/virtio/vhost-shadow-virtqueue.h | 3 +++ 2 files changed, 11 insertions(+), 1 deletion(-) diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c index 8361e70d1b..e1c6952b10 100644 --- a/hw/virtio/vhost-shadow-virtqueue.c +++ b/hw/virtio/vhost-shadow-virtqueue.c @@ -68,7 +68,7 @@ bool vhost_svq_valid_features(uint64_t features, Error **errp) */ static uint16_t vhost_svq_available_slots(const VhostShadowVirtqueue *svq) { -return svq->vring.num - (svq->shadow_avail_idx - svq->shadow_used_idx); +return svq->num_free; } /** @@ -263,6 +263,9 @@ int vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg, return -EINVAL; } +/* Update the size of SVQ vring free descriptors */ +svq->num_free -= ndescs; + svq->desc_state[qemu_head].elem = elem; svq->desc_state[qemu_head].ndescs = ndescs; vhost_svq_kick(svq); @@ -450,6 +453,9 @@ static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq, svq->desc_next[last_used_chain] = svq->free_head; svq->free_head = used_elem.id; +/* Update the size of SVQ vring free descriptors */ +svq->num_free += num; + *len = used_elem.len; return g_steal_pointer(&svq->desc_state[used_elem.id].elem); } @@ -659,6 +665,7 @@ void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev, svq->iova_tree = iova_tree; svq->vring.num = virtio_queue_get_num(vdev, virtio_get_queue_index(vq)); +svq->num_free = svq->vring.num; driver_size = vhost_svq_driver_area_size(svq); device_size = vhost_svq_device_area_size(svq); svq->vring.desc = qemu_memalign(qemu_real_host_page_size(), driver_size); diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h index 926a4897b1..6efe051a70 100644 --- a/hw/virtio/vhost-shadow-virtqueue.h +++ b/hw/virtio/vhost-shadow-virtqueue.h @@ -107,6 +107,9 @@ typedef struct VhostShadowVirtqueue { /* Next head to consume from the device */ uint16_t last_used_idx; + +/* Size of SVQ vring free descriptors */ +uint16_t num_free; } VhostShadowVirtqueue; bool vhost_svq_valid_features(uint64_t features, Error **errp); -- 2.25.1
Re: [PATCH RESEND] vhost: fix possible wrap in SVQ descriptor ring
Hi Eugenio, Thanks for reviewing. On 2023/5/9 1:26, Eugenio Perez Martin wrote: > On Sat, May 6, 2023 at 5:01 PM Hawkins Jiawei wrote: >> >> QEMU invokes vhost_svq_add() when adding a guest's element into SVQ. >> In vhost_svq_add(), it uses vhost_svq_available_slots() to check >> whether QEMU can add the element into the SVQ. If there is >> enough space, then QEMU combines some out descriptors and >> some in descriptors into one descriptor chain, and add it into >> svq->vring.desc by vhost_svq_vring_write_descs(). >> >> Yet the problem is that, `svq->shadow_avail_idx - svq->shadow_used_idx` >> in vhost_svq_available_slots() return the number of occupied elements, >> or the number of descriptor chains, instead of the number of occupied >> descriptors, which may cause wrapping in SVQ descriptor ring. >> >> Here is an example. In vhost_handle_guest_kick(), QEMU forwards >> as many available buffers to device by virtqueue_pop() and >> vhost_svq_add_element(). virtqueue_pop() return a guest's element, >> and use vhost_svq_add_elemnt(), a wrapper to vhost_svq_add(), to >> add this element into SVQ. If QEMU invokes virtqueue_pop() and >> vhost_svq_add_element() `svq->vring.num` times, vhost_svq_available_slots() >> thinks QEMU just ran out of slots and everything should work fine. >> But in fact, virtqueue_pop() return `svq-vring.num` elements or >> descriptor chains, more than `svq->vring.num` descriptors, due to >> guest memory fragmentation, and this cause wrapping in SVQ descriptor ring. >> > > The bug is valid even before marking the descriptors used. If the > guest memory is fragmented, SVQ must add chains so it can try to add > more descriptors than possible. I will add this in the commit message in v2 patch. > >> Therefore, this patch adds `num_free` field in VhostShadowVirtqueue >> structure, updates this field in vhost_svq_add() and >> vhost_svq_get_buf(), to record the number of free descriptors. >> Then we can avoid wrap in SVQ descriptor ring by refactoring >> vhost_svq_available_slots(). >> >> Fixes: 100890f7ca ("vhost: Shadow virtqueue buffers forwarding") >> Signed-off-by: Hawkins Jiawei >> --- >> hw/virtio/vhost-shadow-virtqueue.c | 9 - >> hw/virtio/vhost-shadow-virtqueue.h | 3 +++ >> 2 files changed, 11 insertions(+), 1 deletion(-) >> >> diff --git a/hw/virtio/vhost-shadow-virtqueue.c >> b/hw/virtio/vhost-shadow-virtqueue.c >> index 8361e70d1b..e1c6952b10 100644 >> --- a/hw/virtio/vhost-shadow-virtqueue.c >> +++ b/hw/virtio/vhost-shadow-virtqueue.c >> @@ -68,7 +68,7 @@ bool vhost_svq_valid_features(uint64_t features, Error >> **errp) >>*/ >> static uint16_t vhost_svq_available_slots(const VhostShadowVirtqueue *svq) >> { >> -return svq->vring.num - (svq->shadow_avail_idx - svq->shadow_used_idx); >> +return svq->num_free; >> } >> >> /** >> @@ -263,6 +263,9 @@ int vhost_svq_add(VhostShadowVirtqueue *svq, const >> struct iovec *out_sg, >> return -EINVAL; >> } >> >> +/* Update the size of SVQ vring free descriptors */ >> +svq->num_free -= ndescs; >> + >> svq->desc_state[qemu_head].elem = elem; >> svq->desc_state[qemu_head].ndescs = ndescs; >> vhost_svq_kick(svq); >> @@ -450,6 +453,9 @@ static VirtQueueElement >> *vhost_svq_get_buf(VhostShadowVirtqueue *svq, >> svq->desc_next[last_used_chain] = svq->free_head; >> svq->free_head = used_elem.id; >> >> +/* Update the size of SVQ vring free descriptors */ > > No need for this comment. > > Apart from that, > > Acked-by: Eugenio Pérez > Thanks for your suggestion. I will remove the comment in v2 patch, with this tag on. >> +svq->num_free += num; >> + >> *len = used_elem.len; >> return g_steal_pointer(&svq->desc_state[used_elem.id].elem); >> } >> @@ -659,6 +665,7 @@ void vhost_svq_start(VhostShadowVirtqueue *svq, >> VirtIODevice *vdev, >> svq->iova_tree = iova_tree; >> >> svq->vring.num = virtio_queue_get_num(vdev, >> virtio_get_queue_index(vq)); >> +svq->num_free = svq->vring.num; >> driver_size = vhost_svq_driver_area_size(svq); >> device_size = vhost_svq_device_area_size(svq); >> svq->vring.desc = qemu_memalign(qemu_real_host_page_size(), >> driver_size); >> diff --git a/hw/virtio/vhost-shadow-virtqueue.h >> b/hw/virtio/vhost-shadow-virtqueue.h >> index 926a4897b1..6efe051a70 100644 >> --- a/hw/virtio/vhost-shadow-virtqueue.h >> +++ b/hw/virtio/vhost-shadow-virtqueue.h >> @@ -107,6 +107,9 @@ typedef struct VhostShadowVirtqueue { >> >> /* Next head to consume from the device */ >> uint16_t last_used_idx; >> + >> +/* Size of SVQ vring free descriptors */ >> +uint16_t num_free; >> } VhostShadowVirtqueue; >> >> bool vhost_svq_valid_features(uint64_t features, Error **errp); >> -- >> 2.25.1 >> >
[PATCH v2] vhost: fix possible wrap in SVQ descriptor ring
QEMU invokes vhost_svq_add() when adding a guest's element into SVQ. In vhost_svq_add(), it uses vhost_svq_available_slots() to check whether QEMU can add the element into SVQ. If there is enough space, then QEMU combines some out descriptors and some in descriptors into one descriptor chain, and adds it into `svq->vring.desc` by vhost_svq_vring_write_descs(). Yet the problem is that, `svq->shadow_avail_idx - svq->shadow_used_idx` in vhost_svq_available_slots() returns the number of occupied elements, or the number of descriptor chains, instead of the number of occupied descriptors, which may cause wrapping in SVQ descriptor ring. Here is an example. In vhost_handle_guest_kick(), QEMU forwards as many available buffers to device by virtqueue_pop() and vhost_svq_add_element(). virtqueue_pop() returns a guest's element, and then this element is added into SVQ by vhost_svq_add_element(), a wrapper to vhost_svq_add(). If QEMU invokes virtqueue_pop() and vhost_svq_add_element() `svq->vring.num` times, vhost_svq_available_slots() thinks QEMU just ran out of slots and everything should work fine. But in fact, virtqueue_pop() returns `svq->vring.num` elements or descriptor chains, more than `svq->vring.num` descriptors due to guest memory fragmentation, and this causes wrapping in SVQ descriptor ring. This bug is valid even before marking the descriptors used. If the guest memory is fragmented, SVQ must add chains so it can try to add more descriptors than possible. This patch solves it by adding `num_free` field in VhostShadowVirtqueue structure and updating this field in vhost_svq_add() and vhost_svq_get_buf(), to record the number of free descriptors. Fixes: 100890f7ca ("vhost: Shadow virtqueue buffers forwarding") Signed-off-by: Hawkins Jiawei Acked-by: Eugenio Pérez --- v2: - update the commit message - remove the unnecessary comment - add the Acked-by tag v1: https://lists.gnu.org/archive/html/qemu-devel/2023-05/msg01727.html hw/virtio/vhost-shadow-virtqueue.c | 5 - hw/virtio/vhost-shadow-virtqueue.h | 3 +++ 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c index 8361e70d1b..bd7c12b6d3 100644 --- a/hw/virtio/vhost-shadow-virtqueue.c +++ b/hw/virtio/vhost-shadow-virtqueue.c @@ -68,7 +68,7 @@ bool vhost_svq_valid_features(uint64_t features, Error **errp) */ static uint16_t vhost_svq_available_slots(const VhostShadowVirtqueue *svq) { -return svq->vring.num - (svq->shadow_avail_idx - svq->shadow_used_idx); +return svq->num_free; } /** @@ -263,6 +263,7 @@ int vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg, return -EINVAL; } +svq->num_free -= ndescs; svq->desc_state[qemu_head].elem = elem; svq->desc_state[qemu_head].ndescs = ndescs; vhost_svq_kick(svq); @@ -449,6 +450,7 @@ static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq, last_used_chain = vhost_svq_last_desc_of_chain(svq, num, used_elem.id); svq->desc_next[last_used_chain] = svq->free_head; svq->free_head = used_elem.id; +svq->num_free += num; *len = used_elem.len; return g_steal_pointer(&svq->desc_state[used_elem.id].elem); @@ -659,6 +661,7 @@ void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev, svq->iova_tree = iova_tree; svq->vring.num = virtio_queue_get_num(vdev, virtio_get_queue_index(vq)); +svq->num_free = svq->vring.num; driver_size = vhost_svq_driver_area_size(svq); device_size = vhost_svq_device_area_size(svq); svq->vring.desc = qemu_memalign(qemu_real_host_page_size(), driver_size); diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h index 926a4897b1..6efe051a70 100644 --- a/hw/virtio/vhost-shadow-virtqueue.h +++ b/hw/virtio/vhost-shadow-virtqueue.h @@ -107,6 +107,9 @@ typedef struct VhostShadowVirtqueue { /* Next head to consume from the device */ uint16_t last_used_idx; + +/* Size of SVQ vring free descriptors */ +uint16_t num_free; } VhostShadowVirtqueue; bool vhost_svq_valid_features(uint64_t features, Error **errp); -- 2.25.1
Re: [PATCH v2 2/2] vdpa: send CVQ state load commands in parallel
Sorry for forgetting cc when replying to the email. On Wed, 17 May 2023 at 16:22, Eugenio Perez Martin wrote: > > On Wed, May 17, 2023 at 7:22 AM Jason Wang wrote: > > > > On Sat, May 6, 2023 at 10:07 PM Hawkins Jiawei wrote: > > > > > > This patch introduces the vhost_vdpa_net_cvq_add() and > > > refactors the vhost_vdpa_net_load*(), so that QEMU can > > > send CVQ state load commands in parallel. > > > > > > To be more specific, this patch introduces vhost_vdpa_net_cvq_add() > > > to add SVQ control commands to SVQ and kick the device, > > > but does not poll the device used buffers. QEMU will not > > > poll and check the device used buffers in vhost_vdpa_net_load() > > > until all CVQ state load commands have been sent to the device. > > > > > > What's more, in order to avoid buffer overwriting caused by > > > using `svq->cvq_cmd_out_buffer` and `svq->status` as the > > > buffer for all CVQ state load commands when sending > > > CVQ state load commands in parallel, this patch introduces > > > `out_cursor` and `in_cursor` in vhost_vdpa_net_load(), > > > pointing to the available buffer for in descriptor and > > > out descriptor, so that different CVQ state load commands can > > > use their unique buffer. > > > > > > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1578 > > > Signed-off-by: Hawkins Jiawei > > > --- > > > net/vhost-vdpa.c | 152 +-- > > > 1 file changed, 120 insertions(+), 32 deletions(-) > > > > > > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c > > > index 10804c7200..14e31ca5c5 100644 > > > --- a/net/vhost-vdpa.c > > > +++ b/net/vhost-vdpa.c > > > @@ -590,6 +590,44 @@ static void vhost_vdpa_net_cvq_stop(NetClientState > > > *nc) > > > vhost_vdpa_net_client_stop(nc); > > > } > > > > > > +/** > > > + * vhost_vdpa_net_cvq_add() adds SVQ control commands to SVQ, > > > + * kicks the device but does not poll the device used buffers. > > > + * > > > + * Return the number of elements added to SVQ if success. > > > + */ > > > +static int vhost_vdpa_net_cvq_add(VhostVDPAState *s, > > > +void **out_cursor, size_t out_len, > > > > Can we track things like cursors in e.g VhostVDPAState ? > > > > Cursors will only be used at device startup. After that, cursors are > always the full buffer. Do we want to "pollute" VhostVDPAState with > things that will not be used after the startup? > > Maybe we can create a struct for them and then pass just this struct? Yes, Currently, the new version of vhost_vdpa_net_cvq_add() is only called in vhost_vdpa_net_load() at startup, so these cursors will not be used after startup. If needed, we can create a `VhostVDPACursor` as suggested by Eugenio. > > Thanks! > > > > +virtio_net_ctrl_ack **in_cursor, size_t > > > in_len) > > > +{ > > > +/* Buffers for the device */ > > > +const struct iovec out = { > > > +.iov_base = *out_cursor, > > > +.iov_len = out_len, > > > +}; > > > +const struct iovec in = { > > > +.iov_base = *in_cursor, > > > +.iov_len = sizeof(virtio_net_ctrl_ack), > > > +}; > > > +VhostShadowVirtqueue *svq = > > > g_ptr_array_index(s->vhost_vdpa.shadow_vqs, 0); > > > +int r; > > > + > > > +r = vhost_svq_add(svq, &out, 1, &in, 1, NULL); > > > +if (unlikely(r != 0)) { > > > +if (unlikely(r == -ENOSPC)) { > > > +qemu_log_mask(LOG_GUEST_ERROR, "%s: No space on device > > > queue\n", > > > + __func__); > > > +} > > > +return r; > > > +} > > > + > > > +/* Update the cursor */ > > > +*out_cursor += out_len; > > > +*in_cursor += 1; > > > + > > > +return 1; > > > +} > > > + > > > /** > > > * vhost_vdpa_net_cvq_add_and_wait() adds SVQ control commands to SVQ, > > > * kicks the device and polls the device used buffers. > > > @@ -628,69 +666,82 @@ static ssize_t > > > vhost_vdpa_net_cvq_add_and_wait(VhostVDPAState *s, > > > return vhost_svq_poll(svq); > > > } > > > > > > -static ssize_t vhost_vdpa_net_load_cmd(
Re: [PATCH v2 1/2] vdpa: rename vhost_vdpa_net_cvq_add()
Sorry for forgetting cc when replying to the email. I will resend this email with cc. On Wed, 17 May 2023 at 12:12, Jason Wang wrote: > > On Sat, May 6, 2023 at 10:07 PM Hawkins Jiawei wrote: > > > > We want to introduce a new version of vhost_vdpa_net_cvq_add() that > > does not poll immediately after forwarding custom buffers > > to the device, so that QEMU can send all the SVQ control commands > > in parallel instead of serialized. > > > > Signed-off-by: Hawkins Jiawei > > --- > > net/vhost-vdpa.c | 15 +++ > > 1 file changed, 11 insertions(+), 4 deletions(-) > > > > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c > > index 99904a0da7..10804c7200 100644 > > --- a/net/vhost-vdpa.c > > +++ b/net/vhost-vdpa.c > > @@ -590,8 +590,14 @@ static void vhost_vdpa_net_cvq_stop(NetClientState *nc) > > vhost_vdpa_net_client_stop(nc); > > } > > > > -static ssize_t vhost_vdpa_net_cvq_add(VhostVDPAState *s, size_t out_len, > > - size_t in_len) > > +/** > > + * vhost_vdpa_net_cvq_add_and_wait() adds SVQ control commands to SVQ, > > + * kicks the device and polls the device used buffers. > > + * > > + * Return the length written by the device. > > + */ > > +static ssize_t vhost_vdpa_net_cvq_add_and_wait(VhostVDPAState *s, > > Nit: is it better to use "poll" or "sync" other than wait? > > Other than this: > > Acked-by: Jason Wang Hi Jason, Thanks for your suggestion. I prefer 'poll', which makes it clearer that this function will poll immediately compared to the new version of vhost_vdpa_net_cvq_add(). I will refactor this in the v2 patch with the Acked-by tag on. Thanks! > > Thanks > > > +size_t out_len, size_t in_len) > > { > > /* Buffers for the device */ > > const struct iovec out = { > > @@ -636,7 +642,7 @@ static ssize_t vhost_vdpa_net_load_cmd(VhostVDPAState > > *s, uint8_t class, > > memcpy(s->cvq_cmd_out_buffer, &ctrl, sizeof(ctrl)); > > memcpy(s->cvq_cmd_out_buffer + sizeof(ctrl), data, data_size); > > > > -return vhost_vdpa_net_cvq_add(s, sizeof(ctrl) + data_size, > > +return vhost_vdpa_net_cvq_add_and_wait(s, sizeof(ctrl) + data_size, > >sizeof(virtio_net_ctrl_ack)); > > } > > > > @@ -753,7 +759,8 @@ static int > > vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq, > > dev_written = sizeof(status); > > *s->status = VIRTIO_NET_OK; > > } else { > > -dev_written = vhost_vdpa_net_cvq_add(s, out.iov_len, > > sizeof(status)); > > +dev_written = vhost_vdpa_net_cvq_add_and_wait(s, out.iov_len, > > + sizeof(status)); > > if (unlikely(dev_written < 0)) { > > goto out; > > } > > -- > > 2.25.1 > > >
Re: [PATCH v2 2/2] vdpa: send CVQ state load commands in parallel
On 2023/5/18 14:12, Jason Wang wrote: > On Thu, May 18, 2023 at 2:00 PM Eugenio Perez Martin > wrote: >> >> On Thu, May 18, 2023 at 7:47 AM Jason Wang wrote: >>> >>> On Wed, May 17, 2023 at 11:02 PM Hawkins Jiawei wrote: >>>> >>>> Sorry for forgetting cc when replying to the email. >>>> >>>> On Wed, 17 May 2023 at 16:22, Eugenio Perez Martin >>>> wrote: >>>>> >>>>> On Wed, May 17, 2023 at 7:22 AM Jason Wang wrote: >>>>>> >>>>>> On Sat, May 6, 2023 at 10:07 PM Hawkins Jiawei >>>>>> wrote: >>>>>>> >>>>>>> This patch introduces the vhost_vdpa_net_cvq_add() and >>>>>>> refactors the vhost_vdpa_net_load*(), so that QEMU can >>>>>>> send CVQ state load commands in parallel. >>>>>>> >>>>>>> To be more specific, this patch introduces vhost_vdpa_net_cvq_add() >>>>>>> to add SVQ control commands to SVQ and kick the device, >>>>>>> but does not poll the device used buffers. QEMU will not >>>>>>> poll and check the device used buffers in vhost_vdpa_net_load() >>>>>>> until all CVQ state load commands have been sent to the device. >>>>>>> >>>>>>> What's more, in order to avoid buffer overwriting caused by >>>>>>> using `svq->cvq_cmd_out_buffer` and `svq->status` as the >>>>>>> buffer for all CVQ state load commands when sending >>>>>>> CVQ state load commands in parallel, this patch introduces >>>>>>> `out_cursor` and `in_cursor` in vhost_vdpa_net_load(), >>>>>>> pointing to the available buffer for in descriptor and >>>>>>> out descriptor, so that different CVQ state load commands can >>>>>>> use their unique buffer. >>>>>>> >>>>>>> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1578 >>>>>>> Signed-off-by: Hawkins Jiawei >>>>>>> --- >>>>>>> net/vhost-vdpa.c | 152 +-- >>>>>>> 1 file changed, 120 insertions(+), 32 deletions(-) >>>>>>> >>>>>>> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c >>>>>>> index 10804c7200..14e31ca5c5 100644 >>>>>>> --- a/net/vhost-vdpa.c >>>>>>> +++ b/net/vhost-vdpa.c >>>>>>> @@ -590,6 +590,44 @@ static void vhost_vdpa_net_cvq_stop(NetClientState >>>>>>> *nc) >>>>>>> vhost_vdpa_net_client_stop(nc); >>>>>>> } >>>>>>> >>>>>>> +/** >>>>>>> + * vhost_vdpa_net_cvq_add() adds SVQ control commands to SVQ, >>>>>>> + * kicks the device but does not poll the device used buffers. >>>>>>> + * >>>>>>> + * Return the number of elements added to SVQ if success. >>>>>>> + */ >>>>>>> +static int vhost_vdpa_net_cvq_add(VhostVDPAState *s, >>>>>>> +void **out_cursor, size_t out_len, >>>>>> >>>>>> Can we track things like cursors in e.g VhostVDPAState ? >>>>>> >>>>> >>>>> Cursors will only be used at device startup. After that, cursors are >>>>> always the full buffer. Do we want to "pollute" VhostVDPAState with >>>>> things that will not be used after the startup? >>> >>> So it's the cursor of the cvq_cmd_out_buffer, it would be more elegant >>> to keep it with where the cvq_cmd_out_buffer is placed. It can avoid >>> passing cursors in several levels. >>> >> >> That's right, there is no reason to update at vhost_vdpa_net_cvq_add. >> It can be done at the caller. But in any case, these cursors need to be passed as alone parameters to vhost_vdpa_net_cvq_add(), so that they can be accessed for `struct iovec` `iov_base` field, right? Considering that we always pass these parameters, so I also update them together in vhost_vdpa_net_cvq_add() in this patch. If we want to avoid passing cursors in several levels, is it okay to backup `cvq_cmd_out_buffer` and `status` in vhost_vdpa_net_load(), access and update them through `VhostVDPAState` directly during loading, and restore them when finishing loading.
[PATCH 2/2] vdpa: send CVQ state load commands in parallel
This patch introduces the vhost_vdpa_net_cvq_add() and refactors the vhost_vdpa_net_load*(), so that QEMU can send CVQ state load commands in parallel. To be more specific, this patch introduces vhost_vdpa_net_cvq_add() to add SVQ control commands to SVQ and kick the device, but does not poll the device used buffers. QEMU will not poll and check the device used buffers in vhost_vdpa_net_load() until all CVQ state load commands have been sent to the device. What's more, in order to avoid buffer overwriting caused by using `svq->cvq_cmd_out_buffer` and `svq->status` as the buffer for all CVQ state load commands when sending CVQ state load commands in parallel, this patch introduces `out_cursor` and `in_cursor` in vhost_vdpa_net_load(), pointing to the available buffer for in descriptor and out descriptor, so that different CVQ state load commands can use their unique buffer. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1578 Signed-off-by: Hawkins Jiawei --- net/vhost-vdpa.c | 137 +++ 1 file changed, 102 insertions(+), 35 deletions(-) diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c index 10804c7200..d1f7113992 100644 --- a/net/vhost-vdpa.c +++ b/net/vhost-vdpa.c @@ -590,6 +590,44 @@ static void vhost_vdpa_net_cvq_stop(NetClientState *nc) vhost_vdpa_net_client_stop(nc); } +/** + * vhost_vdpa_net_cvq_add() adds SVQ control commands to SVQ, + * kicks the device but does not poll the device used buffers. + * + * Return the length of the device used buffers. + */ +static ssize_t vhost_vdpa_net_cvq_add(VhostVDPAState *s, +void **out_cursor, size_t out_len, +void **in_cursor, size_t in_len) +{ +/* Buffers for the device */ +const struct iovec out = { +.iov_base = *out_cursor, +.iov_len = out_len, +}; +const struct iovec in = { +.iov_base = *in_cursor, +.iov_len = in_len, +}; +VhostShadowVirtqueue *svq = g_ptr_array_index(s->vhost_vdpa.shadow_vqs, 0); +int r; + +r = vhost_svq_add(svq, &out, 1, &in, 1, NULL); +if (unlikely(r != 0)) { +if (unlikely(r == -ENOSPC)) { +qemu_log_mask(LOG_GUEST_ERROR, "%s: No space on device queue\n", + __func__); +} +return r; +} + +/* Update the cursor */ +*out_cursor += out_len; +*in_cursor += in_len; + +return in_len; +} + /** * vhost_vdpa_net_cvq_add_and_wait() adds SVQ control commands to SVQ, * kicks the device and polls the device used buffers. @@ -628,69 +666,64 @@ static ssize_t vhost_vdpa_net_cvq_add_and_wait(VhostVDPAState *s, return vhost_svq_poll(svq); } -static ssize_t vhost_vdpa_net_load_cmd(VhostVDPAState *s, uint8_t class, - uint8_t cmd, const void *data, - size_t data_size) +static ssize_t vhost_vdpa_net_load_cmd(VhostVDPAState *s, +void **out_cursor, uint8_t class, uint8_t cmd, +const void *data, size_t data_size, +void **in_cursor) { const struct virtio_net_ctrl_hdr ctrl = { .class = class, .cmd = cmd, }; -assert(data_size < vhost_vdpa_net_cvq_cmd_page_len() - sizeof(ctrl)); +assert(sizeof(ctrl) < vhost_vdpa_net_cvq_cmd_page_len() - + (*out_cursor - s->cvq_cmd_out_buffer)); +assert(data_size < vhost_vdpa_net_cvq_cmd_page_len() - sizeof(ctrl) - + (*out_cursor - s->cvq_cmd_out_buffer)); -memcpy(s->cvq_cmd_out_buffer, &ctrl, sizeof(ctrl)); -memcpy(s->cvq_cmd_out_buffer + sizeof(ctrl), data, data_size); +memcpy(*out_cursor, &ctrl, sizeof(ctrl)); +memcpy(*out_cursor + sizeof(ctrl), data, data_size); -return vhost_vdpa_net_cvq_add_and_wait(s, sizeof(ctrl) + data_size, - sizeof(virtio_net_ctrl_ack)); +return vhost_vdpa_net_cvq_add(s, out_cursor, sizeof(ctrl) + data_size, + in_cursor, sizeof(virtio_net_ctrl_ack)); } -static int vhost_vdpa_net_load_mac(VhostVDPAState *s, const VirtIONet *n) +static ssize_t vhost_vdpa_net_load_mac(VhostVDPAState *s, const VirtIONet *n, + void **out_cursor, void **in_cursor) { uint64_t features = n->parent_obj.guest_features; if (features & BIT_ULL(VIRTIO_NET_F_CTRL_MAC_ADDR)) { -ssize_t dev_written = vhost_vdpa_net_load_cmd(s, VIRTIO_NET_CTRL_MAC, - VIRTIO_NET_CTRL_MAC_ADDR_SET, - n->mac, sizeof(n->mac)); -if (unlikely(dev_written < 0)) { -return dev_written; -} - -return *s->status != VIRTIO_NE
[PATCH 1/2] vdpa: rename vhost_vdpa_net_cvq_add()
We want to introduce a new version of vhost_vdpa_net_cvq_add() that does not poll immediately after forwarding custom buffers to the device, so that QEMU can send all the SVQ control commands in parallel instead of serialized. Signed-off-by: Hawkins Jiawei --- net/vhost-vdpa.c | 15 +++ 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c index 99904a0da7..10804c7200 100644 --- a/net/vhost-vdpa.c +++ b/net/vhost-vdpa.c @@ -590,8 +590,14 @@ static void vhost_vdpa_net_cvq_stop(NetClientState *nc) vhost_vdpa_net_client_stop(nc); } -static ssize_t vhost_vdpa_net_cvq_add(VhostVDPAState *s, size_t out_len, - size_t in_len) +/** + * vhost_vdpa_net_cvq_add_and_wait() adds SVQ control commands to SVQ, + * kicks the device and polls the device used buffers. + * + * Return the length written by the device. + */ +static ssize_t vhost_vdpa_net_cvq_add_and_wait(VhostVDPAState *s, +size_t out_len, size_t in_len) { /* Buffers for the device */ const struct iovec out = { @@ -636,7 +642,7 @@ static ssize_t vhost_vdpa_net_load_cmd(VhostVDPAState *s, uint8_t class, memcpy(s->cvq_cmd_out_buffer, &ctrl, sizeof(ctrl)); memcpy(s->cvq_cmd_out_buffer + sizeof(ctrl), data, data_size); -return vhost_vdpa_net_cvq_add(s, sizeof(ctrl) + data_size, +return vhost_vdpa_net_cvq_add_and_wait(s, sizeof(ctrl) + data_size, sizeof(virtio_net_ctrl_ack)); } @@ -753,7 +759,8 @@ static int vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq, dev_written = sizeof(status); *s->status = VIRTIO_NET_OK; } else { -dev_written = vhost_vdpa_net_cvq_add(s, out.iov_len, sizeof(status)); +dev_written = vhost_vdpa_net_cvq_add_and_wait(s, out.iov_len, + sizeof(status)); if (unlikely(dev_written < 0)) { goto out; } -- 2.25.1
[PATCH 0/2] Send all the SVQ control commands in parallel
This patchset allows QEMU to poll and check the device used buffer after sending all SVQ control commands, instead of polling and checking immediately after sending each SVQ control command, so that QEMU can send all the SVQ control commands in parallel, which have better performance improvement. I use vdpa_sim_net to simulate vdpa device, refactor vhost_vdpa_net_load() to call vhost_vdpa_net_load_mac() 30 times, refactor `net_vhost_vdpa_cvq_info.load` to call vhost_vdpa_net_load() 1000 times, to build a test environment for sending multiple SVQ control commands. Time in monotonic to finish `net_vhost_vdpa_cvq_info.load`: QEMUmonotonic time -- not patched 89202 -- patched 80455 This patchset resolves the GitLab issue at https://gitlab.com/qemu-project/qemu/-/issues/1578. Hawkins Jiawei (2): vdpa: rename vhost_vdpa_net_cvq_add() vdpa: send CVQ state load commands in parallel net/vhost-vdpa.c | 150 +++ 1 file changed, 112 insertions(+), 38 deletions(-) -- 2.25.1
[PATCH] vhost: fix possible wrap in SVQ descriptor ring
QEMU invokes vhost_svq_add() when adding a guest's element into SVQ. In vhost_svq_add(), it uses vhost_svq_available_slots() to check whether QEMU can add the element into the SVQ. If there is enough space, then QEMU combines some out descriptors and some in descriptors into one descriptor chain, and add it into svq->vring.desc by vhost_svq_vring_write_descs(). Yet the problem is that, `svq->shadow_avail_idx - svq->shadow_used_idx` in vhost_svq_available_slots() return the number of occupied elements, or the number of descriptor chains, instead of the number of occupied descriptors, which may cause wrapping in SVQ descriptor ring. Here is an example. In vhost_handle_guest_kick(), QEMU forwards as many available buffers to device by virtqueue_pop() and vhost_svq_add_element(). virtqueue_pop() return a guest's element, and use vhost_svq_add_elemnt(), a wrapper to vhost_svq_add(), to add this element into SVQ. If QEMU invokes virtqueue_pop() and vhost_svq_add_element() `svq->vring.num` times, vhost_svq_available_slots() thinks QEMU just ran out of slots and everything should work fine. But in fact, virtqueue_pop() return `svq-vring.num` elements or descriptor chains, more than `svq->vring.num` descriptors, due to guest memory fragmentation, and this cause wrapping in SVQ descriptor ring. Therefore, this patch adds `num_free` field in VhostShadowVirtqueue structure, updates this field in vhost_svq_add() and vhost_svq_get_buf(), to record the number of free descriptors. Then we can avoid wrap in SVQ descriptor ring by refactoring vhost_svq_available_slots(). Fixes: 100890f7ca ("vhost: Shadow virtqueue buffers forwarding") Signed-off-by: Hawkins Jiawei --- hw/virtio/vhost-shadow-virtqueue.c | 9 - hw/virtio/vhost-shadow-virtqueue.h | 3 +++ 2 files changed, 11 insertions(+), 1 deletion(-) diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c index 8361e70d1b..e1c6952b10 100644 --- a/hw/virtio/vhost-shadow-virtqueue.c +++ b/hw/virtio/vhost-shadow-virtqueue.c @@ -68,7 +68,7 @@ bool vhost_svq_valid_features(uint64_t features, Error **errp) */ static uint16_t vhost_svq_available_slots(const VhostShadowVirtqueue *svq) { -return svq->vring.num - (svq->shadow_avail_idx - svq->shadow_used_idx); +return svq->num_free; } /** @@ -263,6 +263,9 @@ int vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg, return -EINVAL; } +/* Update the size of SVQ vring free descriptors */ +svq->num_free -= ndescs; + svq->desc_state[qemu_head].elem = elem; svq->desc_state[qemu_head].ndescs = ndescs; vhost_svq_kick(svq); @@ -450,6 +453,9 @@ static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq, svq->desc_next[last_used_chain] = svq->free_head; svq->free_head = used_elem.id; +/* Update the size of SVQ vring free descriptors */ +svq->num_free += num; + *len = used_elem.len; return g_steal_pointer(&svq->desc_state[used_elem.id].elem); } @@ -659,6 +665,7 @@ void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev, svq->iova_tree = iova_tree; svq->vring.num = virtio_queue_get_num(vdev, virtio_get_queue_index(vq)); +svq->num_free = svq->vring.num; driver_size = vhost_svq_driver_area_size(svq); device_size = vhost_svq_device_area_size(svq); svq->vring.desc = qemu_memalign(qemu_real_host_page_size(), driver_size); diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h index 926a4897b1..6efe051a70 100644 --- a/hw/virtio/vhost-shadow-virtqueue.h +++ b/hw/virtio/vhost-shadow-virtqueue.h @@ -107,6 +107,9 @@ typedef struct VhostShadowVirtqueue { /* Next head to consume from the device */ uint16_t last_used_idx; + +/* Size of SVQ vring free descriptors */ +uint16_t num_free; } VhostShadowVirtqueue; bool vhost_svq_valid_features(uint64_t features, Error **errp); -- 2.25.1
Re: [PATCH 0/2] Send all the SVQ control commands in parallel
On Thu, 20 Apr 2023 at 01:17, Eugenio Perez Martin wrote: > > On Wed, Apr 19, 2023 at 1:50 PM Hawkins Jiawei wrote: > > > > This patchset allows QEMU to poll and check the device used buffer > > after sending all SVQ control commands, instead of polling and checking > > immediately after sending each SVQ control command, so that QEMU can > > send all the SVQ control commands in parallel, which have better > > performance improvement. > > > > I use vdpa_sim_net to simulate vdpa device, refactor > > vhost_vdpa_net_load() to call vhost_vdpa_net_load_mac() 30 times, > > refactor `net_vhost_vdpa_cvq_info.load` to call vhost_vdpa_net_load() > > 1000 times, > > Maybe a little bit too high for real scenarios but it gives us a hint > for sure :). Maybe it is more realistic to send ~10 or ~100 commands? Yes, it is absolutely too high for real scenarios to call vhost_vdpa_net_load() 1000 times. But considering that the time to execute vhost_vdpa_net_load_mac() 30 times is very short, the result time may be highly unstable and fluctuate greatly, so I call vhost_vdpa_net_load() 1000 times, hoping to get a more stable result. > > > to build a test environment for sending > > multiple SVQ control commands. Time in monotonic to > > finish `net_vhost_vdpa_cvq_info.load`: > > > > QEMUmonotonic time > > -- > > not patched 89202 > > -- > > patched 80455 > > > > Is time expressed in seconds or milliseconds? I'm going to assume ms. I got this by calling g_get_monotonic_time(), it should be microseconds according to [1]. [1]. https://docs.gtk.org/glib/func.get_monotonic_time.html > > So let's say all the time was spent in the context switch between qemu > and kernel, this is a save of (89202 - 80455)/3 = 0.3 ms per > command? Yes, I think it is a save of 0.3 microseconds per command. Thanks! > > Thanks! > > > This patchset resolves the GitLab issue at > > https://gitlab.com/qemu-project/qemu/-/issues/1578. > > > > Hawkins Jiawei (2): > > vdpa: rename vhost_vdpa_net_cvq_add() > > vdpa: send CVQ state load commands in parallel > > > > net/vhost-vdpa.c | 150 +++ > > 1 file changed, 112 insertions(+), 38 deletions(-) > > > > -- > > 2.25.1 > > >
Re: [PATCH 2/2] vdpa: send CVQ state load commands in parallel
On Thu, 20 Apr 2023 at 01:43, Eugenio Perez Martin wrote: > > On Wed, Apr 19, 2023 at 1:50 PM Hawkins Jiawei wrote: > > > > This patch introduces the vhost_vdpa_net_cvq_add() and > > refactors the vhost_vdpa_net_load*(), so that QEMU can > > send CVQ state load commands in parallel. > > > > To be more specific, this patch introduces vhost_vdpa_net_cvq_add() > > to add SVQ control commands to SVQ and kick the device, > > but does not poll the device used buffers. QEMU will not > > poll and check the device used buffers in vhost_vdpa_net_load() > > until all CVQ state load commands have been sent to the device. > > > > What's more, in order to avoid buffer overwriting caused by > > using `svq->cvq_cmd_out_buffer` and `svq->status` as the > > buffer for all CVQ state load commands when sending > > CVQ state load commands in parallel, this patch introduces > > `out_cursor` and `in_cursor` in vhost_vdpa_net_load(), > > pointing to the available buffer for in descriptor and > > out descriptor, so that different CVQ state load commands can > > use their unique buffer. > > > > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1578 > > Signed-off-by: Hawkins Jiawei > > --- > > net/vhost-vdpa.c | 137 +++ > > 1 file changed, 102 insertions(+), 35 deletions(-) > > > > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c > > index 10804c7200..d1f7113992 100644 > > --- a/net/vhost-vdpa.c > > +++ b/net/vhost-vdpa.c > > @@ -590,6 +590,44 @@ static void vhost_vdpa_net_cvq_stop(NetClientState *nc) > > vhost_vdpa_net_client_stop(nc); > > } > > > > +/** > > + * vhost_vdpa_net_cvq_add() adds SVQ control commands to SVQ, > > + * kicks the device but does not poll the device used buffers. > > + * > > + * Return the length of the device used buffers. > > + */ > > +static ssize_t vhost_vdpa_net_cvq_add(VhostVDPAState *s, > > +void **out_cursor, size_t out_len, > > +void **in_cursor, size_t in_len) > > +{ > > +/* Buffers for the device */ > > +const struct iovec out = { > > +.iov_base = *out_cursor, > > +.iov_len = out_len, > > +}; > > +const struct iovec in = { > > +.iov_base = *in_cursor, > > +.iov_len = in_len, > > +}; > > +VhostShadowVirtqueue *svq = > > g_ptr_array_index(s->vhost_vdpa.shadow_vqs, 0); > > +int r; > > + > > +r = vhost_svq_add(svq, &out, 1, &in, 1, NULL); > > +if (unlikely(r != 0)) { > > +if (unlikely(r == -ENOSPC)) { > > +qemu_log_mask(LOG_GUEST_ERROR, "%s: No space on device > > queue\n", > > + __func__); > > +} > > +return r; > > +} > > + > > +/* Update the cursor */ > > +*out_cursor += out_len; > > +*in_cursor += in_len; > > + > > +return in_len; > > +} > > + > > /** > > * vhost_vdpa_net_cvq_add_and_wait() adds SVQ control commands to SVQ, > > * kicks the device and polls the device used buffers. > > @@ -628,69 +666,64 @@ static ssize_t > > vhost_vdpa_net_cvq_add_and_wait(VhostVDPAState *s, > > return vhost_svq_poll(svq); > > } > > > > -static ssize_t vhost_vdpa_net_load_cmd(VhostVDPAState *s, uint8_t class, > > - uint8_t cmd, const void *data, > > - size_t data_size) > > +static ssize_t vhost_vdpa_net_load_cmd(VhostVDPAState *s, > > +void **out_cursor, uint8_t class, uint8_t > > cmd, > > +const void *data, size_t data_size, > > +void **in_cursor) > > { > > const struct virtio_net_ctrl_hdr ctrl = { > > .class = class, > > .cmd = cmd, > > }; > > > > -assert(data_size < vhost_vdpa_net_cvq_cmd_page_len() - sizeof(ctrl)); > > +assert(sizeof(ctrl) < vhost_vdpa_net_cvq_cmd_page_len() - > > + (*out_cursor - s->cvq_cmd_out_buffer)); > > +assert(data_size < vhost_vdpa_net_cvq_cmd_page_len() - sizeof(ctrl) - > > + (*out_cursor - s->cvq_cmd_out_buffer)); > > > > -memcpy(s->cvq_cmd_out_buffer, &ctrl, sizeof(ctrl)); > > -memcpy(s->cvq_cmd_out_buffer + sizeof(ctrl), data, data_size); > > +
Re: [PATCH 2/2] vdpa: send CVQ state load commands in parallel
On Thu, 20 Apr 2023 at 19:38, Hawkins Jiawei wrote: > > On Thu, 20 Apr 2023 at 01:43, Eugenio Perez Martin > wrote: > > > > On Wed, Apr 19, 2023 at 1:50 PM Hawkins Jiawei wrote: > > > > > > + ++status) { > > > +if (*status != VIRTIO_NET_OK) { > > > +++r; > > > +} > > > +} > > > + > > > +return r; > > > > Although the caller is fine with >=0, I think we should keep the 0 == > > success. The number of commands delivered does not make a lot of sense > > for the callers, just if the call succeeded or not. > > Thanks for the explanation, I will refactor the patch as you suggested. I still have some questions about the check on device used buffers. My initial thought was to return the number of commands whose `in buffer` value is not VIRTIO_NET_OK. If we are not supposed to return value > 0, what should we return if some commands' `in buffer` value is not VIRTIO_NET_OK. Should we return an error code, such as EINVAL(Invalid argument), indicating that QEMU can not successfully send all SVQ commands in the current state. Or should we just do not check the device used buffers, and return 0 when QEMU finishes polling? Thanks! > > > > > Thanks! > > > > > } > > > > > > static NetClientInfo net_vhost_vdpa_cvq_info = { > > > -- > > > 2.25.1 > > > > >
[PATCH 0/2] Vhost-vdpa Shadow Virtqueue Offloads support
This series enables shadowed CVQ to intercept Offloads commands through shadowed CVQ, update the virtio NIC device model so qemu send it in a migration, and the restore of that Offloads state in the destination. Hawkins Jiawei (2): vdpa: Add vhost_vdpa_net_load_offloads vdpa: Allow VIRTIO_NET_F_CTRL_GUEST_OFFLOADS in SVQ net/vhost-vdpa.c | 27 +++ 1 file changed, 27 insertions(+) -- 2.25.1
[PATCH 1/2] vdpa: Add vhost_vdpa_net_load_offloads
This patch introduces vhost_vdpa_net_load_offloads() to restore offloads state at device's startup. Signed-off-by: Hawkins Jiawei --- net/vhost-vdpa.c | 26 ++ 1 file changed, 26 insertions(+) diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c index 37cdc84562..682c749b19 100644 --- a/net/vhost-vdpa.c +++ b/net/vhost-vdpa.c @@ -680,6 +680,28 @@ static int vhost_vdpa_net_load_mq(VhostVDPAState *s, return *s->status != VIRTIO_NET_OK; } +static int vhost_vdpa_net_load_offloads(VhostVDPAState *s, +const VirtIONet *n) +{ +uint64_t features, offloads; +ssize_t dev_written; + +features = n->parent_obj.guest_features; +if (!(features & BIT_ULL(VIRTIO_NET_F_CTRL_GUEST_OFFLOADS))) { +return 0; +} + +offloads = cpu_to_le64(n->curr_guest_offloads); +dev_written = vhost_vdpa_net_load_cmd(s, VIRTIO_NET_CTRL_GUEST_OFFLOADS, + VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET, + &offloads, sizeof(offloads)); +if (unlikely(dev_written < 0)) { +return dev_written; +} + +return *s->status != VIRTIO_NET_OK; +} + static int vhost_vdpa_net_load(NetClientState *nc) { VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc); @@ -702,6 +724,10 @@ static int vhost_vdpa_net_load(NetClientState *nc) if (unlikely(r)) { return r; } +r = vhost_vdpa_net_load_offloads(s, n); +if (unlikely(r)) { +return r; +} return 0; } -- 2.25.1
[PATCH 2/2] vdpa: Allow VIRTIO_NET_F_CTRL_GUEST_OFFLOADS in SVQ
Enable SVQ with VIRTIO_NET_F_CTRL_GUEST_OFFLOADS feature. Signed-off-by: Hawkins Jiawei --- net/vhost-vdpa.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c index 682c749b19..cc52b7f0ad 100644 --- a/net/vhost-vdpa.c +++ b/net/vhost-vdpa.c @@ -85,6 +85,7 @@ const int vdpa_feature_bits[] = { static const uint64_t vdpa_svq_device_features = BIT_ULL(VIRTIO_NET_F_CSUM) | BIT_ULL(VIRTIO_NET_F_GUEST_CSUM) | +BIT_ULL(VIRTIO_NET_F_CTRL_GUEST_OFFLOADS) | BIT_ULL(VIRTIO_NET_F_MTU) | BIT_ULL(VIRTIO_NET_F_MAC) | BIT_ULL(VIRTIO_NET_F_GUEST_TSO4) | -- 2.25.1
Re: [PATCH 0/2] Vhost-vdpa Shadow Virtqueue Offloads support
On 2023/5/30 0:55, Eugenio Perez Martin wrote: > On Mon, May 29, 2023 at 3:18 PM Hawkins Jiawei wrote: >> >> This series enables shadowed CVQ to intercept Offloads commands >> through shadowed CVQ, update the virtio NIC device model so qemu >> send it in a migration, and the restore of that Offloads state >> in the destination. >> >> Hawkins Jiawei (2): >>vdpa: Add vhost_vdpa_net_load_offloads >>vdpa: Allow VIRTIO_NET_F_CTRL_GUEST_OFFLOADS in SVQ >> >> net/vhost-vdpa.c | 27 +++ >> 1 file changed, 27 insertions(+) >> > > CCing MST too. OK, I will cc MST in the future patches for this part. Thanks! > > Thanks! >
[RFC PATCH 0/2] Vhost-vdpa Shadow Virtqueue Hash calculation Support
This series enables shadowed CVQ to intercept VIRTIO_NET_CTRL_MQ_HASH_CONFIG command through shadowed CVQ, update the virtio NIC device model so qemu send it in a migration, and the restore of that Hash calculation state in the destination. Note that this patch should be based on patch "vdpa: Send all CVQ state load commands in parallel" at [1]. [1]. https://lore.kernel.org/all/cover.1689748694.git.yin31...@gmail.com/ TestStep 1. test the migration using vp-vdpa device - For L0 guest, boot QEMU with two virtio-net-pci net device with `ctrl_vq`, `mq`, `hash` features on, command line like: -netdev tap,... -device virtio-net-pci,disable-legacy=on,disable-modern=off, iommu_platform=on,mq=on,ctrl_vq=on,hash=on,guest_announce=off, indirect_desc=off,queue_reset=off,... - For L1 guest, apply the relative patch series and compile the source code, start QEMU with two vdpa device with svq mode on, enable the `ctrl_vq`, `mq`, `hash` features on, command line like: -netdev type=vhost-vdpa,x-svq=true,... -device virtio-net-pci,mq=on,guest_announce=off,ctrl_vq=on, hash=on,... - For L2 source guest, run the following bash command: ```bash #!/bin/sh ethtool -K eth0 rxhash on ``` - Gdb attach the destination VM and break at the vhost_vdpa_net_load_rss() - Execute the live migration in L2 source monitor - Result * with this series, gdb can hit the breakpoint and continue the executing without triggering any error or warning. Hawkins Jiawei (2): vdpa: Restore hash calculation state vdpa: Allow VIRTIO_NET_F_HASH_REPORT in SVQ net/vhost-vdpa.c | 89 1 file changed, 89 insertions(+) -- 2.25.1
[RFC PATCH 1/2] vdpa: Restore hash calculation state
This patch introduces vhost_vdpa_net_load_rss() to restore the hash calculation state at device's startup. Note that vhost_vdpa_net_load_rss() has `do_rss` argument, which allows future code to reuse this function to restore the receive-side scaling state when the VIRTIO_NET_F_RSS feature is enabled in SVQ. Currently, vhost_vdpa_net_load_rss() could only be invoked when `do_rss` is set to false. Signed-off-by: Hawkins Jiawei --- Question: It seems that virtio_net_handle_rss() currently does not restore the hash key length parsed from the CVQ command sent from the guest into n->rss_data and uses the maximum key length in other code. So for `hash_key_length` field in VIRTIO_NET_CTRL_MQ_HASH_CONFIG command sent to device, is it okay to also use the maximum key length as its value? Or should we introduce the `hash_key_length` field in n->rss_data structure to record the key length from guest and use this value? net/vhost-vdpa.c | 88 1 file changed, 88 insertions(+) diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c index 7bb29f6009..bd51020771 100644 --- a/net/vhost-vdpa.c +++ b/net/vhost-vdpa.c @@ -788,6 +788,85 @@ static int vhost_vdpa_net_load_mac(VhostVDPAState *s, const VirtIONet *n, return 0; } +static int vhost_vdpa_net_load_rss(VhostVDPAState *s, const VirtIONet *n, + void **out_cursor, void **in_cursor, + bool do_rss) +{ +struct virtio_net_rss_config cfg; +ssize_t r; + +/* + * According to VirtIO standard, "Initially the device has all hash + * types disabled and reports only VIRTIO_NET_HASH_REPORT_NONE.". + * + * Therefore, there is no need to send this CVQ command if the + * driver disable the all hash types, which aligns with + * the device's defaults. + * + * Note that the device's defaults can mismatch the driver's + * configuration only at live migration. + */ +if (!n->rss_data.enabled || +n->rss_data.hash_types == VIRTIO_NET_HASH_REPORT_NONE) { +return 0; +} + +cfg.hash_types = cpu_to_le32(n->rss_data.hash_types); +/* + * According to VirtIO standard, "Field reserved MUST contain zeroes. + * It is defined to make the structure to match the layout of + * virtio_net_rss_config structure, defined in 5.1.6.5.7.". + * + * Therefore, we need to zero the fields in struct virtio_net_rss_config, + * which corresponds the `reserved` field in + * struct virtio_net_hash_config. + */ +memset(&cfg.indirection_table_mask, 0, + sizeof_field(struct virtio_net_hash_config, reserved)); +/* + * Consider that virtio_net_handle_rss() currently does not restore the + * hash key length parsed from the CVQ command sent from the guest into + * n->rss_data and uses the maximum key length in other code, so we also + * employthe the maxium key length here. + */ +cfg.hash_key_length = sizeof(n->rss_data.key); + +g_autofree uint16_t *table = g_malloc_n(n->rss_data.indirections_len, +sizeof(n->rss_data.indirections_table[0])); +for (int i = 0; i < n->rss_data.indirections_len; ++i) { +table[i] = cpu_to_le16(n->rss_data.indirections_table[i]); +} + +const struct iovec data[] = { +{ +.iov_base = &cfg, +.iov_len = offsetof(struct virtio_net_rss_config, +indirection_table), +}, { +.iov_base = table, +.iov_len = n->rss_data.indirections_len * + sizeof(n->rss_data.indirections_table[0]), +}, { +.iov_base = &cfg.max_tx_vq, +.iov_len = offsetof(struct virtio_net_rss_config, hash_key_data) - + offsetof(struct virtio_net_rss_config, max_tx_vq), +}, { +.iov_base = (void *)n->rss_data.key, +.iov_len = sizeof(n->rss_data.key), +} +}; + +r = vhost_vdpa_net_load_cmd(s, out_cursor, in_cursor, +VIRTIO_NET_CTRL_MQ, +VIRTIO_NET_CTRL_MQ_HASH_CONFIG, +data, ARRAY_SIZE(data)); +if (unlikely(r < 0)) { +return r; +} + +return 0; +} + static int vhost_vdpa_net_load_mq(VhostVDPAState *s, const VirtIONet *n, void **out_cursor, void **in_cursor) @@ -812,6 +891,15 @@ static int vhost_vdpa_net_load_mq(VhostVDPAState *s, return r; } +if (!virtio_vdev_has_feature(&n->parent_obj, VIRTIO_NET_F_HASH_REPORT)) { +return 0; +} + +r = vhost_vdpa_net_load_rss(s, n, out_cursor, in_cursor, false); +if (unlikely(r < 0)) { +return r; +} + return 0; } -- 2.25.1
[RFC PATCH 2/2] vdpa: Allow VIRTIO_NET_F_HASH_REPORT in SVQ
Enable SVQ with VIRTIO_NET_F_HASH_REPORT feature. Signed-off-by: Hawkins Jiawei --- net/vhost-vdpa.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c index bd51020771..a13b267250 100644 --- a/net/vhost-vdpa.c +++ b/net/vhost-vdpa.c @@ -118,6 +118,7 @@ static const uint64_t vdpa_svq_device_features = BIT_ULL(VIRTIO_NET_F_CTRL_MAC_ADDR) | /* VHOST_F_LOG_ALL is exposed by SVQ */ BIT_ULL(VHOST_F_LOG_ALL) | +BIT_ULL(VIRTIO_NET_F_HASH_REPORT) | BIT_ULL(VIRTIO_NET_F_RSC_EXT) | BIT_ULL(VIRTIO_NET_F_STANDBY) | BIT_ULL(VIRTIO_NET_F_SPEED_DUPLEX); -- 2.25.1
[RFC PATCH 1/3] vdpa: Add SetSteeringEBPF method for NetClientState
At present, to enable the VIRTIO_NET_F_RSS feature, eBPF must be loaded for the vhost backend. Given that vhost-vdpa is one of the vhost backend, we need to implement the SetSteeringEBPF method to support RSS for vhost-vdpa, even if vhost-vdpa calculates the rss hash in the hardware device instead of in the kernel by eBPF. Although this requires QEMU to be compiled with `--enable-bpf` configuration even if the vdpa device does not use eBPF to calculate the rss hash, this can avoid adding the specific conditional statements for vDPA case to enable the VIRTIO_NET_F_RSS feature, which reduces code maintainbility. Suggested-by: Eugenio Pérez Signed-off-by: Hawkins Jiawei --- net/vhost-vdpa.c | 8 1 file changed, 8 insertions(+) diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c index a13b267250..4c8e4b19f6 100644 --- a/net/vhost-vdpa.c +++ b/net/vhost-vdpa.c @@ -238,6 +238,12 @@ static void vhost_vdpa_cleanup(NetClientState *nc) } } +/** Dummy SetSteeringEBPF to support RSS for vhost-vdpa backend */ +static bool vhost_vdpa_set_steering_ebpf(NetClientState *nc, int prog_fd) +{ +return true; +} + static bool vhost_vdpa_has_vnet_hdr(NetClientState *nc) { assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA); @@ -400,6 +406,7 @@ static NetClientInfo net_vhost_vdpa_info = { .has_vnet_hdr = vhost_vdpa_has_vnet_hdr, .has_ufo = vhost_vdpa_has_ufo, .check_peer_type = vhost_vdpa_check_peer_type, +.set_steering_ebpf = vhost_vdpa_set_steering_ebpf, }; static int64_t vhost_vdpa_get_vring_group(int device_fd, unsigned vq_index, @@ -1215,6 +1222,7 @@ static NetClientInfo net_vhost_vdpa_cvq_info = { .has_vnet_hdr = vhost_vdpa_has_vnet_hdr, .has_ufo = vhost_vdpa_has_ufo, .check_peer_type = vhost_vdpa_check_peer_type, +.set_steering_ebpf = vhost_vdpa_set_steering_ebpf, }; /* -- 2.25.1
[RFC PATCH 2/3] vdpa: Restore receive-side scaling state
This patch reuses vhost_vdpa_net_load_rss() with some refactorings to restore the receive-side scaling state at device's startup. Signed-off-by: Hawkins Jiawei --- net/vhost-vdpa.c | 53 1 file changed, 35 insertions(+), 18 deletions(-) diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c index 4c8e4b19f6..7870cbe142 100644 --- a/net/vhost-vdpa.c +++ b/net/vhost-vdpa.c @@ -820,17 +820,28 @@ static int vhost_vdpa_net_load_rss(VhostVDPAState *s, const VirtIONet *n, } cfg.hash_types = cpu_to_le32(n->rss_data.hash_types); -/* - * According to VirtIO standard, "Field reserved MUST contain zeroes. - * It is defined to make the structure to match the layout of - * virtio_net_rss_config structure, defined in 5.1.6.5.7.". - * - * Therefore, we need to zero the fields in struct virtio_net_rss_config, - * which corresponds the `reserved` field in - * struct virtio_net_hash_config. - */ -memset(&cfg.indirection_table_mask, 0, - sizeof_field(struct virtio_net_hash_config, reserved)); +if (do_rss) { +/* + * According to VirtIO standard, "Number of entries in indirection_table + * is (indirection_table_mask + 1)". + */ +cfg.indirection_table_mask = cpu_to_le16(n->rss_data.indirections_len - + 1); +cfg.unclassified_queue = cpu_to_le16(n->rss_data.default_queue); +cfg.max_tx_vq = cpu_to_le16(n->curr_queue_pairs); +} else { +/* + * According to VirtIO standard, "Field reserved MUST contain zeroes. + * It is defined to make the structure to match the layout of + * virtio_net_rss_config structure, defined in 5.1.6.5.7.". + * + * Therefore, we need to zero the fields in + * struct virtio_net_rss_config, which corresponds the `reserved` field + * in struct virtio_net_hash_config. + */ +memset(&cfg.indirection_table_mask, 0, + sizeof_field(struct virtio_net_hash_config, reserved)); +} /* * Consider that virtio_net_handle_rss() currently does not restore the * hash key length parsed from the CVQ command sent from the guest into @@ -866,6 +877,7 @@ static int vhost_vdpa_net_load_rss(VhostVDPAState *s, const VirtIONet *n, r = vhost_vdpa_net_load_cmd(s, out_cursor, in_cursor, VIRTIO_NET_CTRL_MQ, +do_rss ? VIRTIO_NET_CTRL_MQ_RSS_CONFIG : VIRTIO_NET_CTRL_MQ_HASH_CONFIG, data, ARRAY_SIZE(data)); if (unlikely(r < 0)) { @@ -899,13 +911,18 @@ static int vhost_vdpa_net_load_mq(VhostVDPAState *s, return r; } -if (!virtio_vdev_has_feature(&n->parent_obj, VIRTIO_NET_F_HASH_REPORT)) { -return 0; -} - -r = vhost_vdpa_net_load_rss(s, n, out_cursor, in_cursor, false); -if (unlikely(r < 0)) { -return r; +if (virtio_vdev_has_feature(&n->parent_obj, VIRTIO_NET_F_RSS)) { +/* Load the receive-side scaling state */ +r = vhost_vdpa_net_load_rss(s, n, out_cursor, in_cursor, true); +if (unlikely(r < 0)) { +return r; +} +} else if (virtio_vdev_has_feature(&n->parent_obj, VIRTIO_NET_F_RSS)) { +/* Load the hash calculation state */ +r = vhost_vdpa_net_load_rss(s, n, out_cursor, in_cursor, false); +if (unlikely(r < 0)) { +return r; +} } return 0; -- 2.25.1
[RFC PATCH 0/3] Vhost-vdpa Shadow Virtqueue RSS Support
This series enables shadowed CVQ to intercept RSS command through shadowed CVQ, update the virtio NIC device model so qemu send it in a migration, and the restore of that RSS state in the destination. Note that this patch should be based on patch "Vhost-vdpa Shadow Virtqueue Hash calculation Support" at [1]. [1]. https://lore.kernel.org/all/cover.1691762906.git.yin31...@gmail.com/ TestStep 1. test the migration using vp-vdpa device - For L0 guest, boot QEMU with two virtio-net-pci net device with `in-qemu` RSS, command line like: -netdev tap,vhost=off... -device virtio-net-pci,disable-legacy=on,disable-modern=off, iommu_platform=on,mq=on,ctrl_vq=on,hash=on,rss=on,guest_announce=off, indirect_desc=off,queue_reset=off,... - For L1 guest, apply the relative patch series and compile the source code, start QEMU with two vdpa device with svq mode on, enable the `ctrl_vq`, `mq`, `rss` features on, command line like: -netdev type=vhost-vdpa,x-svq=true,... -device virtio-net-pci,mq=on,guest_announce=off,ctrl_vq=on, rss=on,... - For L2 source guest, run the following bash command: ```bash #!/bin/sh ethtool -K eth0 rxhash on ``` - Execute the live migration in L2 source monitor - Result * with this series, L2 QEMU can execute without triggering any error or warning. L0 QEMU echo "Can't load eBPF RSS - fallback to software RSS". Hawkins Jiawei (3): vdpa: Add SetSteeringEBPF method for NetClientState vdpa: Restore receive-side scaling state vdpa: Allow VIRTIO_NET_F_RSS in SVQ net/vhost-vdpa.c | 62 ++-- 1 file changed, 44 insertions(+), 18 deletions(-) -- 2.25.1
[RFC PATCH 3/3] vdpa: Allow VIRTIO_NET_F_RSS in SVQ
Enable SVQ with VIRTIO_NET_F_RSS feature. Signed-off-by: Hawkins Jiawei --- net/vhost-vdpa.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c index 7870cbe142..eb08530396 100644 --- a/net/vhost-vdpa.c +++ b/net/vhost-vdpa.c @@ -119,6 +119,7 @@ static const uint64_t vdpa_svq_device_features = /* VHOST_F_LOG_ALL is exposed by SVQ */ BIT_ULL(VHOST_F_LOG_ALL) | BIT_ULL(VIRTIO_NET_F_HASH_REPORT) | +BIT_ULL(VIRTIO_NET_F_RSS) | BIT_ULL(VIRTIO_NET_F_RSC_EXT) | BIT_ULL(VIRTIO_NET_F_STANDBY) | BIT_ULL(VIRTIO_NET_F_SPEED_DUPLEX); -- 2.25.1
Re: [RFC PATCH 2/3] vdpa: Restore receive-side scaling state
On 2023/8/11 23:28, Hawkins Jiawei wrote: > This patch reuses vhost_vdpa_net_load_rss() with some > refactorings to restore the receive-side scaling state > at device's startup. > > Signed-off-by: Hawkins Jiawei > --- > net/vhost-vdpa.c | 53 > 1 file changed, 35 insertions(+), 18 deletions(-) > > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c > index 4c8e4b19f6..7870cbe142 100644 > --- a/net/vhost-vdpa.c > +++ b/net/vhost-vdpa.c > @@ -820,17 +820,28 @@ static int vhost_vdpa_net_load_rss(VhostVDPAState *s, > const VirtIONet *n, > } > > cfg.hash_types = cpu_to_le32(n->rss_data.hash_types); > -/* > - * According to VirtIO standard, "Field reserved MUST contain zeroes. > - * It is defined to make the structure to match the layout of > - * virtio_net_rss_config structure, defined in 5.1.6.5.7.". > - * > - * Therefore, we need to zero the fields in struct virtio_net_rss_config, > - * which corresponds the `reserved` field in > - * struct virtio_net_hash_config. > - */ > -memset(&cfg.indirection_table_mask, 0, > - sizeof_field(struct virtio_net_hash_config, reserved)); > +if (do_rss) { > +/* > + * According to VirtIO standard, "Number of entries in > indirection_table > + * is (indirection_table_mask + 1)". > + */ > +cfg.indirection_table_mask = > cpu_to_le16(n->rss_data.indirections_len - > + 1); > +cfg.unclassified_queue = cpu_to_le16(n->rss_data.default_queue); > +cfg.max_tx_vq = cpu_to_le16(n->curr_queue_pairs); > +} else { > +/* > + * According to VirtIO standard, "Field reserved MUST contain zeroes. > + * It is defined to make the structure to match the layout of > + * virtio_net_rss_config structure, defined in 5.1.6.5.7.". > + * > + * Therefore, we need to zero the fields in > + * struct virtio_net_rss_config, which corresponds the `reserved` > field > + * in struct virtio_net_hash_config. > + */ > +memset(&cfg.indirection_table_mask, 0, > + sizeof_field(struct virtio_net_hash_config, reserved)); > +} > /* >* Consider that virtio_net_handle_rss() currently does not restore the >* hash key length parsed from the CVQ command sent from the guest into > @@ -866,6 +877,7 @@ static int vhost_vdpa_net_load_rss(VhostVDPAState *s, > const VirtIONet *n, > > r = vhost_vdpa_net_load_cmd(s, out_cursor, in_cursor, > VIRTIO_NET_CTRL_MQ, > +do_rss ? VIRTIO_NET_CTRL_MQ_RSS_CONFIG : > VIRTIO_NET_CTRL_MQ_HASH_CONFIG, > data, ARRAY_SIZE(data)); > if (unlikely(r < 0)) { > @@ -899,13 +911,18 @@ static int vhost_vdpa_net_load_mq(VhostVDPAState *s, > return r; > } > > -if (!virtio_vdev_has_feature(&n->parent_obj, VIRTIO_NET_F_HASH_REPORT)) { > -return 0; > -} > - > -r = vhost_vdpa_net_load_rss(s, n, out_cursor, in_cursor, false); > -if (unlikely(r < 0)) { > -return r; > +if (virtio_vdev_has_feature(&n->parent_obj, VIRTIO_NET_F_RSS)) { > +/* Load the receive-side scaling state */ > +r = vhost_vdpa_net_load_rss(s, n, out_cursor, in_cursor, true); > +if (unlikely(r < 0)) { > +return r; > +} > +} else if (virtio_vdev_has_feature(&n->parent_obj, VIRTIO_NET_F_RSS)) { The correct feature to be used here is VIRTIO_NET_F_HASH_REPORT, rather than VIRTIO_NET_F_RSS. I will correct this in the v2 patch. Thanks! > +/* Load the hash calculation state */ > +r = vhost_vdpa_net_load_rss(s, n, out_cursor, in_cursor, false); > +if (unlikely(r < 0)) { > +return r; > +} > } > > return 0;
[RFC PATCH v2 1/3] vdpa: Add SetSteeringEBPF method for NetClientState
At present, to enable the VIRTIO_NET_F_RSS feature, eBPF must be loaded for the vhost backend. Given that vhost-vdpa is one of the vhost backend, we need to implement the SetSteeringEBPF method to support RSS for vhost-vdpa, even if vhost-vdpa calculates the rss hash in the hardware device instead of in the kernel by eBPF. Although this requires QEMU to be compiled with `--enable-bpf` configuration even if the vdpa device does not use eBPF to calculate the rss hash, this can avoid adding the specific conditional statements for vDPA case to enable the VIRTIO_NET_F_RSS feature, which reduces code maintainbility. Suggested-by: Eugenio Pérez Signed-off-by: Hawkins Jiawei --- net/vhost-vdpa.c | 8 1 file changed, 8 insertions(+) diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c index a13b267250..4c8e4b19f6 100644 --- a/net/vhost-vdpa.c +++ b/net/vhost-vdpa.c @@ -238,6 +238,12 @@ static void vhost_vdpa_cleanup(NetClientState *nc) } } +/** Dummy SetSteeringEBPF to support RSS for vhost-vdpa backend */ +static bool vhost_vdpa_set_steering_ebpf(NetClientState *nc, int prog_fd) +{ +return true; +} + static bool vhost_vdpa_has_vnet_hdr(NetClientState *nc) { assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA); @@ -400,6 +406,7 @@ static NetClientInfo net_vhost_vdpa_info = { .has_vnet_hdr = vhost_vdpa_has_vnet_hdr, .has_ufo = vhost_vdpa_has_ufo, .check_peer_type = vhost_vdpa_check_peer_type, +.set_steering_ebpf = vhost_vdpa_set_steering_ebpf, }; static int64_t vhost_vdpa_get_vring_group(int device_fd, unsigned vq_index, @@ -1215,6 +1222,7 @@ static NetClientInfo net_vhost_vdpa_cvq_info = { .has_vnet_hdr = vhost_vdpa_has_vnet_hdr, .has_ufo = vhost_vdpa_has_ufo, .check_peer_type = vhost_vdpa_check_peer_type, +.set_steering_ebpf = vhost_vdpa_set_steering_ebpf, }; /* -- 2.25.1
[RFC PATCH v2 2/3] vdpa: Restore receive-side scaling state
This patch reuses vhost_vdpa_net_load_rss() with some refactorings to restore the receive-side scaling state at device's startup. Signed-off-by: Hawkins Jiawei --- v2: - Correct the feature usage to VIRTIO_NET_F_HASH_REPORT when loading the hash calculation state v1: https://lore.kernel.org/all/93d5d82f0a5df71df326830033e50358c8b6be7a.1691766252.git.yin31...@gmail.com/ net/vhost-vdpa.c | 54 1 file changed, 36 insertions(+), 18 deletions(-) diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c index 4c8e4b19f6..e21b3ac67a 100644 --- a/net/vhost-vdpa.c +++ b/net/vhost-vdpa.c @@ -820,17 +820,28 @@ static int vhost_vdpa_net_load_rss(VhostVDPAState *s, const VirtIONet *n, } cfg.hash_types = cpu_to_le32(n->rss_data.hash_types); -/* - * According to VirtIO standard, "Field reserved MUST contain zeroes. - * It is defined to make the structure to match the layout of - * virtio_net_rss_config structure, defined in 5.1.6.5.7.". - * - * Therefore, we need to zero the fields in struct virtio_net_rss_config, - * which corresponds the `reserved` field in - * struct virtio_net_hash_config. - */ -memset(&cfg.indirection_table_mask, 0, - sizeof_field(struct virtio_net_hash_config, reserved)); +if (do_rss) { +/* + * According to VirtIO standard, "Number of entries in indirection_table + * is (indirection_table_mask + 1)". + */ +cfg.indirection_table_mask = cpu_to_le16(n->rss_data.indirections_len - + 1); +cfg.unclassified_queue = cpu_to_le16(n->rss_data.default_queue); +cfg.max_tx_vq = cpu_to_le16(n->curr_queue_pairs); +} else { +/* + * According to VirtIO standard, "Field reserved MUST contain zeroes. + * It is defined to make the structure to match the layout of + * virtio_net_rss_config structure, defined in 5.1.6.5.7.". + * + * Therefore, we need to zero the fields in + * struct virtio_net_rss_config, which corresponds the `reserved` field + * in struct virtio_net_hash_config. + */ +memset(&cfg.indirection_table_mask, 0, + sizeof_field(struct virtio_net_hash_config, reserved)); +} /* * Consider that virtio_net_handle_rss() currently does not restore the * hash key length parsed from the CVQ command sent from the guest into @@ -866,6 +877,7 @@ static int vhost_vdpa_net_load_rss(VhostVDPAState *s, const VirtIONet *n, r = vhost_vdpa_net_load_cmd(s, out_cursor, in_cursor, VIRTIO_NET_CTRL_MQ, +do_rss ? VIRTIO_NET_CTRL_MQ_RSS_CONFIG : VIRTIO_NET_CTRL_MQ_HASH_CONFIG, data, ARRAY_SIZE(data)); if (unlikely(r < 0)) { @@ -899,13 +911,19 @@ static int vhost_vdpa_net_load_mq(VhostVDPAState *s, return r; } -if (!virtio_vdev_has_feature(&n->parent_obj, VIRTIO_NET_F_HASH_REPORT)) { -return 0; -} - -r = vhost_vdpa_net_load_rss(s, n, out_cursor, in_cursor, false); -if (unlikely(r < 0)) { -return r; +if (virtio_vdev_has_feature(&n->parent_obj, VIRTIO_NET_F_RSS)) { +/* Load the receive-side scaling state */ +r = vhost_vdpa_net_load_rss(s, n, out_cursor, in_cursor, true); +if (unlikely(r < 0)) { +return r; +} +} else if (virtio_vdev_has_feature(&n->parent_obj, + VIRTIO_NET_F_HASH_REPORT)) { +/* Load the hash calculation state */ +r = vhost_vdpa_net_load_rss(s, n, out_cursor, in_cursor, false); +if (unlikely(r < 0)) { +return r; +} } return 0; -- 2.25.1
[RFC PATCH v2 3/3] vdpa: Allow VIRTIO_NET_F_RSS in SVQ
Enable SVQ with VIRTIO_NET_F_RSS feature. Signed-off-by: Hawkins Jiawei --- net/vhost-vdpa.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c index e21b3ac67a..2a276ef528 100644 --- a/net/vhost-vdpa.c +++ b/net/vhost-vdpa.c @@ -119,6 +119,7 @@ static const uint64_t vdpa_svq_device_features = /* VHOST_F_LOG_ALL is exposed by SVQ */ BIT_ULL(VHOST_F_LOG_ALL) | BIT_ULL(VIRTIO_NET_F_HASH_REPORT) | +BIT_ULL(VIRTIO_NET_F_RSS) | BIT_ULL(VIRTIO_NET_F_RSC_EXT) | BIT_ULL(VIRTIO_NET_F_STANDBY) | BIT_ULL(VIRTIO_NET_F_SPEED_DUPLEX); -- 2.25.1
[RFC PATCH v2 0/3] Vhost-vdpa Shadow Virtqueue RSS Support
This series enables shadowed CVQ to intercept RSS command through shadowed CVQ, update the virtio NIC device model so qemu send it in a migration, and the restore of that RSS state in the destination. Note that this patch should be based on patch "Vhost-vdpa Shadow Virtqueue Hash calculation Support" at [1]. [1]. https://lore.kernel.org/all/cover.1691762906.git.yin31...@gmail.com/ ChangeLog = v2: - Correct the feature usage to VIRTIO_NET_F_HASH_REPORT when loading the hash calculation state in patch "vdpa: Restore receive-side scaling state" v1: https://lore.kernel.org/all/cover.1691766252.git.yin31...@gmail.com/ TestStep 1. test the migration using vp-vdpa device - For L0 guest, boot QEMU with two virtio-net-pci net device with `in-qemu` RSS, command line like: -netdev tap,vhost=off... -device virtio-net-pci,disable-legacy=on,disable-modern=off, iommu_platform=on,mq=on,ctrl_vq=on,hash=on,rss=on,guest_announce=off, indirect_desc=off,queue_reset=off,... - For L1 guest, apply the relative patch series and compile the source code, start QEMU with two vdpa device with svq mode on, enable the `ctrl_vq`, `mq`, `rss` features on, command line like: -netdev type=vhost-vdpa,x-svq=true,... -device virtio-net-pci,mq=on,guest_announce=off,ctrl_vq=on, rss=on,... - For L2 source guest, run the following bash command: ```bash #!/bin/sh ethtool -K eth0 rxhash on ``` - Execute the live migration in L2 source monitor - Result * with this series, L2 QEMU can execute without triggering any error or warning. L0 QEMU echo "Can't load eBPF RSS - fallback to software RSS". Hawkins Jiawei (3): vdpa: Add SetSteeringEBPF method for NetClientState vdpa: Restore receive-side scaling state vdpa: Allow VIRTIO_NET_F_RSS in SVQ net/vhost-vdpa.c | 63 ++-- 1 file changed, 45 insertions(+), 18 deletions(-) -- 2.25.1
Re: [PATCH v3 1/7] vdpa: Use iovec for vhost_vdpa_net_load_cmd()
On 2023/8/17 17:23, Eugenio Perez Martin wrote: > On Fri, Jul 7, 2023 at 5:27 PM Hawkins Jiawei wrote: >> >> According to VirtIO standard, "The driver MUST follow >> the VIRTIO_NET_CTRL_MAC_TABLE_SET command by a le32 number, >> followed by that number of non-multicast MAC addresses, >> followed by another le32 number, followed by that number >> of multicast addresses." >> >> Considering that these data is not stored in contiguous memory, >> this patch refactors vhost_vdpa_net_load_cmd() to accept >> scattered data, eliminating the need for an addtional data copy or >> packing the data into s->cvq_cmd_out_buffer outside of >> vhost_vdpa_net_load_cmd(). >> >> Signed-off-by: Hawkins Jiawei >> --- >> v3: >>- rename argument name to `data_sg` and `data_num` >>- use iov_to_buf() suggested by Eugenio >> >> v2: >> https://lore.kernel.org/all/6d3dc0fc076564a03501e222ef1102a6a7a643af.1688051252.git.yin31...@gmail.com/ >>- refactor vhost_vdpa_load_cmd() to accept iovec suggested by >> Eugenio >> >> net/vhost-vdpa.c | 33 + >> 1 file changed, 25 insertions(+), 8 deletions(-) >> >> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c >> index 373609216f..31ef6ad6ec 100644 >> --- a/net/vhost-vdpa.c >> +++ b/net/vhost-vdpa.c >> @@ -620,29 +620,38 @@ static ssize_t vhost_vdpa_net_cvq_add(VhostVDPAState >> *s, size_t out_len, >> } >> >> static ssize_t vhost_vdpa_net_load_cmd(VhostVDPAState *s, uint8_t class, >> - uint8_t cmd, const void *data, >> - size_t data_size) >> + uint8_t cmd, const struct iovec >> *data_sg, >> + size_t data_num) >> { >> const struct virtio_net_ctrl_hdr ctrl = { >> .class = class, >> .cmd = cmd, >> }; >> +size_t data_size = iov_size(data_sg, data_num); >> >> assert(data_size < vhost_vdpa_net_cvq_cmd_page_len() - sizeof(ctrl)); >> >> +/* pack the CVQ command header */ >> memcpy(s->cvq_cmd_out_buffer, &ctrl, sizeof(ctrl)); >> -memcpy(s->cvq_cmd_out_buffer + sizeof(ctrl), data, data_size); >> >> -return vhost_vdpa_net_cvq_add(s, sizeof(ctrl) + data_size, >> +/* pack the CVQ command command-specific-data */ >> +iov_to_buf(data_sg, data_num, 0, >> + s->cvq_cmd_out_buffer + sizeof(ctrl), data_size); >> + >> +return vhost_vdpa_net_cvq_add(s, data_size + sizeof(ctrl), > > Nit, any reason for changing the order of the addends? sizeof(ctrl) + > data_size ? Hi Eugenio, Here the code should be changed to `sizeof(ctrl) + data_size` as you point out. Since this patch series has already been merged into master, I will submit a separate patch to correct this problem. > >> sizeof(virtio_net_ctrl_ack)); >> } >> >> static int vhost_vdpa_net_load_mac(VhostVDPAState *s, const VirtIONet *n) >> { >> if (virtio_vdev_has_feature(&n->parent_obj, >> VIRTIO_NET_F_CTRL_MAC_ADDR)) { >> +const struct iovec data = { >> +.iov_base = (void *)n->mac, > > Assign to void should always be valid, no need for casting here. Yes, assign to void should be valid for normal pointers. However, `n->mac` is an array and is treated as a const pointer. It will trigger the warning "error: initialization discards ‘const’ qualifier from pointer target type" if we don't add this cast. Thanks! > >> +.iov_len = sizeof(n->mac), >> +}; >> ssize_t dev_written = vhost_vdpa_net_load_cmd(s, >> VIRTIO_NET_CTRL_MAC, >> >> VIRTIO_NET_CTRL_MAC_ADDR_SET, >> - n->mac, sizeof(n->mac)); >> + &data, 1); >> if (unlikely(dev_written < 0)) { >> return dev_written; >> } >> @@ -665,9 +674,13 @@ static int vhost_vdpa_net_load_mq(VhostVDPAState *s, >> } >> >> mq.virtqueue_pairs = cpu_to_le16(n->curr_queue_pairs); >> +const struct iovec data = { >> +.iov_base = &mq, >> +.iov_len = sizeof(mq), >> +}; >> dev_written = vhost_vdpa_net_load_cmd(s, VIRTIO_NET_CTRL_MQ, >> - VIRTIO_N
Re: [PATCH v3 2/7] vdpa: Restore MAC address filtering state
On 2023/8/17 18:18, Eugenio Perez Martin wrote: > On Fri, Jul 7, 2023 at 5:27 PM Hawkins Jiawei wrote: >> >> This patch refactors vhost_vdpa_net_load_mac() to >> restore the MAC address filtering state at device's startup. >> >> Signed-off-by: Hawkins Jiawei >> --- >> v3: >>- return early if mismatch the condition suggested by Eugenio >> >> v2: >> https://lore.kernel.org/all/2f2560f749186c0eb1055f9926f464587e419eeb.1688051252.git.yin31...@gmail.com/ >>- use iovec suggested by Eugenio >>- avoid sending CVQ command in default state >> >> v1: >> https://lore.kernel.org/all/00f72fe154a882fd6dc15bc39e3a1ac63f9dadce.1687402580.git.yin31...@gmail.com/ >> >> net/vhost-vdpa.c | 52 >> 1 file changed, 52 insertions(+) >> >> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c >> index 31ef6ad6ec..7189ccafaf 100644 >> --- a/net/vhost-vdpa.c >> +++ b/net/vhost-vdpa.c >> @@ -660,6 +660,58 @@ static int vhost_vdpa_net_load_mac(VhostVDPAState *s, >> const VirtIONet *n) >> } >> } >> >> +/* >> + * According to VirtIO standard, "The device MUST have an >> + * empty MAC filtering table on reset.". >> + * >> + * Therefore, there is no need to send this CVQ command if the >> + * driver also sets an empty MAC filter table, which aligns with >> + * the device's defaults. >> + * >> + * Note that the device's defaults can mismatch the driver's >> + * configuration only at live migration. >> + */ >> +if (!virtio_vdev_has_feature(&n->parent_obj, VIRTIO_NET_F_CTRL_RX) || >> +n->mac_table.in_use == 0) { >> +return 0; >> +} >> + >> +uint32_t uni_entries = n->mac_table.first_multi, > > QEMU coding style prefers declarations at the beginning of the code > block. Previous uses of these variable names would need to be > refactored to met this rule. Hi Eugenio, Thanks for the detailed explanation. Since this patch series has already been merged into master, I will submit a separate patch to correct this problem. I will take care of this problem in the future. Thanks! > > Apart from that, > > Acked-by: Eugenio Pérez > >> + uni_macs_size = uni_entries * ETH_ALEN, >> + mul_entries = n->mac_table.in_use - uni_entries, >> + mul_macs_size = mul_entries * ETH_ALEN; >> +struct virtio_net_ctrl_mac uni = { >> +.entries = cpu_to_le32(uni_entries), >> +}; >> +struct virtio_net_ctrl_mac mul = { >> +.entries = cpu_to_le32(mul_entries), >> +}; >> +const struct iovec data[] = { >> +{ >> +.iov_base = &uni, >> +.iov_len = sizeof(uni), >> +}, { >> +.iov_base = n->mac_table.macs, >> +.iov_len = uni_macs_size, >> +}, { >> +.iov_base = &mul, >> +.iov_len = sizeof(mul), >> +}, { >> +.iov_base = &n->mac_table.macs[uni_macs_size], >> +.iov_len = mul_macs_size, >> +}, >> +}; >> +ssize_t dev_written = vhost_vdpa_net_load_cmd(s, >> +VIRTIO_NET_CTRL_MAC, >> +VIRTIO_NET_CTRL_MAC_TABLE_SET, >> +data, ARRAY_SIZE(data)); >> +if (unlikely(dev_written < 0)) { >> +return dev_written; >> +} >> +if (*s->status != VIRTIO_NET_OK) { >> +return -EIO; >> +} >> + >> return 0; >> } >> >> -- >> 2.25.1 >> >
Re: [PATCH v3 1/7] vdpa: Use iovec for vhost_vdpa_net_load_cmd()
在 2023/8/17 22:05, Eugenio Perez Martin 写道: > On Thu, Aug 17, 2023 at 2:42 PM Hawkins Jiawei wrote: >> >> On 2023/8/17 17:23, Eugenio Perez Martin wrote: >>> On Fri, Jul 7, 2023 at 5:27 PM Hawkins Jiawei wrote: >>>> >>>> According to VirtIO standard, "The driver MUST follow >>>> the VIRTIO_NET_CTRL_MAC_TABLE_SET command by a le32 number, >>>> followed by that number of non-multicast MAC addresses, >>>> followed by another le32 number, followed by that number >>>> of multicast addresses." >>>> >>>> Considering that these data is not stored in contiguous memory, >>>> this patch refactors vhost_vdpa_net_load_cmd() to accept >>>> scattered data, eliminating the need for an addtional data copy or >>>> packing the data into s->cvq_cmd_out_buffer outside of >>>> vhost_vdpa_net_load_cmd(). >>>> >>>> Signed-off-by: Hawkins Jiawei >>>> --- >>>> v3: >>>> - rename argument name to `data_sg` and `data_num` >>>> - use iov_to_buf() suggested by Eugenio >>>> >>>> v2: >>>> https://lore.kernel.org/all/6d3dc0fc076564a03501e222ef1102a6a7a643af.1688051252.git.yin31...@gmail.com/ >>>> - refactor vhost_vdpa_load_cmd() to accept iovec suggested by >>>> Eugenio >>>> >>>>net/vhost-vdpa.c | 33 + >>>>1 file changed, 25 insertions(+), 8 deletions(-) >>>> >>>> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c >>>> index 373609216f..31ef6ad6ec 100644 >>>> --- a/net/vhost-vdpa.c >>>> +++ b/net/vhost-vdpa.c >>>> @@ -620,29 +620,38 @@ static ssize_t vhost_vdpa_net_cvq_add(VhostVDPAState >>>> *s, size_t out_len, >>>>} >>>> >>>>static ssize_t vhost_vdpa_net_load_cmd(VhostVDPAState *s, uint8_t class, >>>> - uint8_t cmd, const void *data, >>>> - size_t data_size) >>>> + uint8_t cmd, const struct iovec >>>> *data_sg, >>>> + size_t data_num) >>>>{ >>>>const struct virtio_net_ctrl_hdr ctrl = { >>>>.class = class, >>>>.cmd = cmd, >>>>}; >>>> +size_t data_size = iov_size(data_sg, data_num); >>>> >>>>assert(data_size < vhost_vdpa_net_cvq_cmd_page_len() - >>>> sizeof(ctrl)); >>>> >>>> +/* pack the CVQ command header */ >>>>memcpy(s->cvq_cmd_out_buffer, &ctrl, sizeof(ctrl)); >>>> -memcpy(s->cvq_cmd_out_buffer + sizeof(ctrl), data, data_size); >>>> >>>> -return vhost_vdpa_net_cvq_add(s, sizeof(ctrl) + data_size, >>>> +/* pack the CVQ command command-specific-data */ >>>> +iov_to_buf(data_sg, data_num, 0, >>>> + s->cvq_cmd_out_buffer + sizeof(ctrl), data_size); >>>> + >>>> +return vhost_vdpa_net_cvq_add(s, data_size + sizeof(ctrl), >>> >>> Nit, any reason for changing the order of the addends? sizeof(ctrl) + >>> data_size ? >> >> Hi Eugenio, >> >> Here the code should be changed to `sizeof(ctrl) + data_size` as you >> point out. >> >> Since this patch series has already been merged into master, I will >> submit a separate patch to correct this problem. >> > > Ouch, I didn't realize that. No need to make it back again, I was just > trying to reduce lines changed. Ok, I got it. Regardless, thank you for your review! > >>> >>>> sizeof(virtio_net_ctrl_ack)); >>>>} >>>> >>>>static int vhost_vdpa_net_load_mac(VhostVDPAState *s, const VirtIONet >>>> *n) >>>>{ >>>>if (virtio_vdev_has_feature(&n->parent_obj, >>>> VIRTIO_NET_F_CTRL_MAC_ADDR)) { >>>> +const struct iovec data = { >>>> +.iov_base = (void *)n->mac, >>> >>> Assign to void should always be valid, no need for casting here. >> >> Yes, assign to void should be valid for normal pointers. >> >> However, `n->mac` is an array and is treated as a const pointer. It will >> trigger the warning "error: initialization discards ‘const’ qualifier >> from poi
Re: [PATCH v3 1/8] vhost: Add argument to vhost_svq_poll()
On 2023/8/18 23:08, Eugenio Perez Martin wrote: > On Wed, Jul 19, 2023 at 9:54 AM Hawkins Jiawei wrote: >> > > The subject could be more explicit. What about "add count argument to > vhost_svq_poll"? Hi Eugenio, Thanks for reviewing. You are right, I will use this new subject in the v4 patch. Thanks! > > Apart from that: > Acked-by: Eugenio Pérez > >> Next patches in this series will no longer perform an >> immediate poll and check of the device's used buffers >> for each CVQ state load command. Instead, they will >> send CVQ state load commands in parallel by polling >> multiple pending buffers at once. >> >> To achieve this, this patch refactoring vhost_svq_poll() >> to accept a new argument `num`, which allows vhost_svq_poll() >> to wait for the device to use multiple elements, >> rather than polling for a single element. >> >> Signed-off-by: Hawkins Jiawei >> --- >> hw/virtio/vhost-shadow-virtqueue.c | 36 ++ >> hw/virtio/vhost-shadow-virtqueue.h | 2 +- >> net/vhost-vdpa.c | 2 +- >> 3 files changed, 24 insertions(+), 16 deletions(-) >> >> diff --git a/hw/virtio/vhost-shadow-virtqueue.c >> b/hw/virtio/vhost-shadow-virtqueue.c >> index 49e5aed931..e731b1d2ea 100644 >> --- a/hw/virtio/vhost-shadow-virtqueue.c >> +++ b/hw/virtio/vhost-shadow-virtqueue.c >> @@ -514,29 +514,37 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq, >> } >> >> /** >> - * Poll the SVQ for one device used buffer. >> + * Poll the SVQ to wait for the device to use the specified number >> + * of elements and return the total length written by the device. >>* >>* This function race with main event loop SVQ polling, so extra >>* synchronization is needed. >>* >> - * Return the length written by the device. >> + * @svq: The svq >> + * @num: The number of elements that need to be used >>*/ >> -size_t vhost_svq_poll(VhostShadowVirtqueue *svq) >> +size_t vhost_svq_poll(VhostShadowVirtqueue *svq, size_t num) >> { >> -int64_t start_us = g_get_monotonic_time(); >> -uint32_t len = 0; >> +size_t len = 0; >> +uint32_t r; >> >> -do { >> -if (vhost_svq_more_used(svq)) { >> -break; >> -} >> +while (num--) { >> +int64_t start_us = g_get_monotonic_time(); >> >> -if (unlikely(g_get_monotonic_time() - start_us > 10e6)) { >> -return 0; >> -} >> -} while (true); >> +do { >> +if (vhost_svq_more_used(svq)) { >> +break; >> +} >> + >> +if (unlikely(g_get_monotonic_time() - start_us > 10e6)) { >> +return len; >> +} >> +} while (true); >> + >> +vhost_svq_get_buf(svq, &r); >> +len += r; >> +} >> >> -vhost_svq_get_buf(svq, &len); >> return len; >> } >> >> diff --git a/hw/virtio/vhost-shadow-virtqueue.h >> b/hw/virtio/vhost-shadow-virtqueue.h >> index 6efe051a70..5bce67837b 100644 >> --- a/hw/virtio/vhost-shadow-virtqueue.h >> +++ b/hw/virtio/vhost-shadow-virtqueue.h >> @@ -119,7 +119,7 @@ void vhost_svq_push_elem(VhostShadowVirtqueue *svq, >> int vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg, >> size_t out_num, const struct iovec *in_sg, size_t in_num, >> VirtQueueElement *elem); >> -size_t vhost_svq_poll(VhostShadowVirtqueue *svq); >> +size_t vhost_svq_poll(VhostShadowVirtqueue *svq, size_t num); >> >> void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd); >> void vhost_svq_set_svq_call_fd(VhostShadowVirtqueue *svq, int call_fd); >> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c >> index dfd271c456..d1dd140bf6 100644 >> --- a/net/vhost-vdpa.c >> +++ b/net/vhost-vdpa.c >> @@ -625,7 +625,7 @@ static ssize_t vhost_vdpa_net_cvq_add(VhostVDPAState *s, >> size_t out_len, >>* descriptor. Also, we need to take the answer before SVQ pulls by >> itself, >>* when BQL is released >>*/ >> -return vhost_svq_poll(svq); >> +return vhost_svq_poll(svq, 1); >> } >> >> static ssize_t vhost_vdpa_net_load_cmd(VhostVDPAState *s, uint8_t class, >> -- >> 2.25.1 >> >
Re: [PATCH v3 2/8] vdpa: Use iovec for vhost_vdpa_net_cvq_add()
On 2023/8/18 23:23, Eugenio Perez Martin wrote: > On Wed, Jul 19, 2023 at 9:54 AM Hawkins Jiawei wrote: >> >> Next patches in this series will no longer perform an >> immediate poll and check of the device's used buffers >> for each CVQ state load command. Consequently, there >> will be multiple pending buffers in the shadow VirtQueue, >> making it a must for every control command to have its >> own buffer. >> >> To achieve this, this patch refactor vhost_vdpa_net_cvq_add() >> to accept `struct iovec`, which eliminates the coupling of >> control commands to `s->cvq_cmd_out_buffer` and `s->status`, >> allowing them to use their own buffer. >> >> Signed-off-by: Hawkins Jiawei >> --- >> net/vhost-vdpa.c | 38 -- >> 1 file changed, 20 insertions(+), 18 deletions(-) >> >> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c >> index d1dd140bf6..6b16c8ece0 100644 >> --- a/net/vhost-vdpa.c >> +++ b/net/vhost-vdpa.c >> @@ -596,22 +596,14 @@ static void vhost_vdpa_net_cvq_stop(NetClientState *nc) >> vhost_vdpa_net_client_stop(nc); >> } >> >> -static ssize_t vhost_vdpa_net_cvq_add(VhostVDPAState *s, size_t out_len, >> - size_t in_len) >> +static ssize_t vhost_vdpa_net_cvq_add(VhostVDPAState *s, >> + struct iovec *out_sg, size_t out_num, >> + struct iovec *in_sg, size_t in_num) >> { >> -/* Buffers for the device */ >> -const struct iovec out = { >> -.iov_base = s->cvq_cmd_out_buffer, >> -.iov_len = out_len, >> -}; >> -const struct iovec in = { >> -.iov_base = s->status, >> -.iov_len = sizeof(virtio_net_ctrl_ack), >> -}; >> VhostShadowVirtqueue *svq = >> g_ptr_array_index(s->vhost_vdpa.shadow_vqs, 0); >> int r; >> >> -r = vhost_svq_add(svq, &out, 1, &in, 1, NULL); >> +r = vhost_svq_add(svq, out_sg, out_num, in_sg, in_num, NULL); >> if (unlikely(r != 0)) { >> if (unlikely(r == -ENOSPC)) { >> qemu_log_mask(LOG_GUEST_ERROR, "%s: No space on device >> queue\n", >> @@ -637,6 +629,15 @@ static ssize_t vhost_vdpa_net_load_cmd(VhostVDPAState >> *s, uint8_t class, >> .cmd = cmd, >> }; >> size_t data_size = iov_size(data_sg, data_num); >> +/* Buffers for the device */ >> +struct iovec out = { >> +.iov_base = s->cvq_cmd_out_buffer, >> +.iov_len = sizeof(ctrl) + data_size, >> +}; >> +struct iovec in = { >> +.iov_base = s->status, >> +.iov_len = sizeof(*s->status), >> +}; >> >> assert(data_size < vhost_vdpa_net_cvq_cmd_page_len() - sizeof(ctrl)); >> >> @@ -647,8 +648,7 @@ static ssize_t vhost_vdpa_net_load_cmd(VhostVDPAState >> *s, uint8_t class, >> iov_to_buf(data_sg, data_num, 0, >> s->cvq_cmd_out_buffer + sizeof(ctrl), data_size); >> >> -return vhost_vdpa_net_cvq_add(s, data_size + sizeof(ctrl), >> - sizeof(virtio_net_ctrl_ack)); >> +return vhost_vdpa_net_cvq_add(s, &out, 1, &in, 1); >> } >> >> static int vhost_vdpa_net_load_mac(VhostVDPAState *s, const VirtIONet *n) >> @@ -1222,9 +1222,7 @@ static int >> vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq, >> struct iovec out = { >> .iov_base = s->cvq_cmd_out_buffer, >> }; >> -/* in buffer used for device model */ >> -const struct iovec in = { >> -.iov_base = &status, >> +struct iovec in = { > > What if instead of reusing "in" we declare a new struct iovec in the > condition that calls vhost_vdpa_net_cvq_add? Something in the line of > "device_in". > > I'm also ok with this code, but splitting them would reduce the > possibility of sending the wrong one to the device / virtio device > model by mistake. Hi Eugenio, Ok, I will refactor this part of code according to your suggestion in the v4 patch. Thanks! > > Thanks! > >> .iov_len = sizeof(status), >> }; >> ssize_t dev_written = -EINVAL; >> @@ -1232,6 +1230,8 @@ static int >> vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq, >> out.iov_len = iov_to_buf(elem->out_sg, elem->out_num, 0, >>s->c
Re: [PATCH v3 4/8] vdpa: Avoid using vhost_vdpa_net_load_*() outside vhost_vdpa_net_load()
On 2023/8/18 23:39, Eugenio Perez Martin wrote: > On Wed, Jul 19, 2023 at 9:54 AM Hawkins Jiawei wrote: >> >> Next patches in this series will refactor vhost_vdpa_net_load_cmd() >> to iterate through the control commands shadow buffers, allowing QEMU >> to send CVQ state load commands in parallel at device startup. >> >> Considering that QEMU always forwards the CVQ command serialized >> outside of vhost_vdpa_net_load(), it is more elegant to send the >> CVQ commands directly without invoking vhost_vdpa_net_load_*() helpers. >> >> Signed-off-by: Hawkins Jiawei >> --- >> net/vhost-vdpa.c | 17 ++--- >> 1 file changed, 14 insertions(+), 3 deletions(-) >> >> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c >> index dd71008e08..ae8f59adaa 100644 >> --- a/net/vhost-vdpa.c >> +++ b/net/vhost-vdpa.c >> @@ -1098,12 +1098,14 @@ static NetClientInfo net_vhost_vdpa_cvq_info = { >>*/ >> static int vhost_vdpa_net_excessive_mac_filter_cvq_add(VhostVDPAState *s, >> VirtQueueElement >> *elem, >> - struct iovec *out) >> + struct iovec *out, >> + struct iovec *in) >> { >> struct virtio_net_ctrl_mac mac_data, *mac_ptr; >> struct virtio_net_ctrl_hdr *hdr_ptr; >> uint32_t cursor; >> ssize_t r; >> +uint8_t on = 1; >> >> /* parse the non-multicast MAC address entries from CVQ command */ >> cursor = sizeof(*hdr_ptr); >> @@ -1151,7 +1153,16 @@ static int >> vhost_vdpa_net_excessive_mac_filter_cvq_add(VhostVDPAState *s, >>* filter table to the vdpa device, it should send the >>* VIRTIO_NET_CTRL_RX_PROMISC CVQ command to enable promiscuous mode >>*/ >> -r = vhost_vdpa_net_load_rx_mode(s, VIRTIO_NET_CTRL_RX_PROMISC, 1); >> +cursor = 0; >> +hdr_ptr = out->iov_base; >> +out->iov_len = sizeof(*hdr_ptr) + sizeof(on); >> +assert(out->iov_len < vhost_vdpa_net_cvq_cmd_page_len()); >> + >> +hdr_ptr->class = VIRTIO_NET_CTRL_RX; >> +hdr_ptr->cmd = VIRTIO_NET_CTRL_RX_PROMISC; >> +cursor += sizeof(*hdr_ptr); >> +*(uint8_t *)(out->iov_base + cursor) = on; >> +r = vhost_vdpa_net_cvq_add(s, out, 1, in, 1); > > Can this be done with iov_from_buf? Hi Eugenio, Yes, this should be done by iov_from_buf(), I will refactor the code according to your suggestion. Thanks! > >> if (unlikely(r < 0)) { >> return r; >> } >> @@ -1264,7 +1275,7 @@ static int >> vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq, >>* the CVQ command direclty. >>*/ >> dev_written = vhost_vdpa_net_excessive_mac_filter_cvq_add(s, elem, >> - &out); >> + &out, >> &in); >> if (unlikely(dev_written < 0)) { >> goto out; >> } >> -- >> 2.25.1 >> >
Re: [PATCH v3 6/8] vdpa: Move vhost_svq_poll() to the caller of vhost_vdpa_net_cvq_add()
On 2023/8/18 23:48, Eugenio Perez Martin wrote: > On Wed, Jul 19, 2023 at 9:54 AM Hawkins Jiawei wrote: >> >> This patch moves vhost_svq_poll() to the caller of >> vhost_vdpa_net_cvq_add() and introduces a helper funtion. >> >> By making this change, next patches in this series is >> able to refactor vhost_vdpa_net_load_x() only to delay >> the polling and checking process until either the SVQ >> is full or control commands shadow buffers are full. >> >> Signed-off-by: Hawkins Jiawei >> --- >> net/vhost-vdpa.c | 50 ++-- >> 1 file changed, 40 insertions(+), 10 deletions(-) >> >> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c >> index fe0ba19724..d06f38403f 100644 >> --- a/net/vhost-vdpa.c >> +++ b/net/vhost-vdpa.c >> @@ -609,15 +609,21 @@ static ssize_t vhost_vdpa_net_cvq_add(VhostVDPAState >> *s, >> qemu_log_mask(LOG_GUEST_ERROR, "%s: No space on device >> queue\n", >> __func__); >> } >> -return r; >> } >> >> -/* >> - * We can poll here since we've had BQL from the time we sent the >> - * descriptor. Also, we need to take the answer before SVQ pulls by >> itself, >> - * when BQL is released >> - */ >> -return vhost_svq_poll(svq, 1); >> +return r; >> +} >> + >> +/* >> + * Convenience wrapper to poll SVQ for multiple control commands. >> + * >> + * Caller should hold the BQL when invoking this function, and should take >> + * the answer before SVQ pulls by itself when BQL is released. >> + */ >> +static ssize_t vhost_vdpa_net_svq_poll(VhostVDPAState *s, size_t >> cmds_in_flight) >> +{ >> +VhostShadowVirtqueue *svq = g_ptr_array_index(s->vhost_vdpa.shadow_vqs, >> 0); >> +return vhost_svq_poll(svq, cmds_in_flight); >> } >> >> /* Convenience wrapper to get number of available SVQ descriptors */ >> @@ -645,6 +651,7 @@ static ssize_t vhost_vdpa_net_load_cmd(VhostVDPAState >> *s, uint8_t class, >> .iov_base = s->status, >> .iov_len = sizeof(*s->status), >> }; >> +ssize_t r; >> >> assert(data_size < vhost_vdpa_net_cvq_cmd_page_len() - sizeof(ctrl)); >> /* Each CVQ command has one out descriptor and one in descriptor */ >> @@ -657,7 +664,16 @@ static ssize_t vhost_vdpa_net_load_cmd(VhostVDPAState >> *s, uint8_t class, >> iov_to_buf(data_sg, data_num, 0, >> s->cvq_cmd_out_buffer + sizeof(ctrl), data_size); >> >> -return vhost_vdpa_net_cvq_add(s, &out, 1, &in, 1); >> +r = vhost_vdpa_net_cvq_add(s, &out, 1, &in, 1); >> +if (unlikely(r < 0)) { >> +return r; >> +} >> + >> +/* >> + * We can poll here since we've had BQL from the time >> + * we sent the descriptor. >> + */ >> +return vhost_vdpa_net_svq_poll(s, 1); >> } >> >> static int vhost_vdpa_net_load_mac(VhostVDPAState *s, const VirtIONet *n) >> @@ -1152,6 +1168,12 @@ static int >> vhost_vdpa_net_excessive_mac_filter_cvq_add(VhostVDPAState *s, >> if (unlikely(r < 0)) { >> return r; >> } >> + >> +/* >> + * We can poll here since we've had BQL from the time >> + * we sent the descriptor. >> + */ >> +vhost_vdpa_net_svq_poll(s, 1); > > Don't we need to check the return value of vhost_vdpa_net_svq_poll here? Hi Eugenio, Yes, we should always check the return value of vhost_vdpa_net_svq_poll(). I will fix this problem in the v4 patch. Thanks! > >> if (*s->status != VIRTIO_NET_OK) { >> return sizeof(*s->status); >> } >> @@ -1266,10 +1288,18 @@ static int >> vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq, >> goto out; >> } >> } else { >> -dev_written = vhost_vdpa_net_cvq_add(s, &out, 1, &in, 1); >> -if (unlikely(dev_written < 0)) { >> +ssize_t r; >> +r = vhost_vdpa_net_cvq_add(s, &out, 1, &in, 1); >> +if (unlikely(r < 0)) { >> +dev_written = r; >> goto out; >> } >> + >> +/* >> + * We can poll here since we've had BQL from the time >> + * we sent the descriptor. >> + */ >> +dev_written = vhost_vdpa_net_svq_poll(s, 1); >> } >> >> if (unlikely(dev_written < sizeof(status))) { >> -- >> 2.25.1 >> >
Re: [PATCH v3 8/8] vdpa: Send cvq state load commands in parallel
On 2023/8/19 01:27, Eugenio Perez Martin wrote: > On Wed, Jul 19, 2023 at 9:54 AM Hawkins Jiawei wrote: >> >> This patch enables sending CVQ state load commands >> in parallel at device startup by following steps: >> >>* Refactor vhost_vdpa_net_load_cmd() to iterate through >> the control commands shadow buffers. This allows different >> CVQ state load commands to use their own unique buffers. >> >>* Delay the polling and checking of buffers until either >> the SVQ is full or control commands shadow buffers are full. >> >> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1578 >> Signed-off-by: Hawkins Jiawei >> --- >> net/vhost-vdpa.c | 157 +-- >> 1 file changed, 96 insertions(+), 61 deletions(-) >> >> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c >> index 795c9c1fd2..1ebb58f7f6 100644 >> --- a/net/vhost-vdpa.c >> +++ b/net/vhost-vdpa.c >> @@ -633,6 +633,26 @@ static uint16_t >> vhost_vdpa_net_svq_available_slots(VhostVDPAState *s) >> return vhost_svq_available_slots(svq); >> } >> >> +/* >> + * Poll SVQ for multiple pending control commands and check the device's >> ack. >> + * >> + * Caller should hold the BQL when invoking this function. >> + */ >> +static ssize_t vhost_vdpa_net_svq_flush(VhostVDPAState *s, >> +size_t cmds_in_flight) >> +{ >> +vhost_vdpa_net_svq_poll(s, cmds_in_flight); >> + >> +/* Device should and must use only one byte ack each control command */ >> +assert(cmds_in_flight < vhost_vdpa_net_cvq_cmd_page_len()); >> +for (int i = 0; i < cmds_in_flight; ++i) { >> +if (s->status[i] != VIRTIO_NET_OK) { >> +return -EIO; >> +} >> +} >> +return 0; >> +} >> + >> static ssize_t vhost_vdpa_net_load_cmd(VhostVDPAState *s, void >> **out_cursor, >> void **in_cursor, uint8_t class, >> uint8_t cmd, const struct iovec >> *data_sg, >> @@ -642,19 +662,41 @@ static ssize_t vhost_vdpa_net_load_cmd(VhostVDPAState >> *s, void **out_cursor, >> .class = class, >> .cmd = cmd, >> }; >> -size_t data_size = iov_size(data_sg, data_num); >> +size_t data_size = iov_size(data_sg, data_num), >> + left_bytes = vhost_vdpa_net_cvq_cmd_page_len() - >> +(*out_cursor - s->cvq_cmd_out_buffer); >> /* Buffers for the device */ >> struct iovec out = { >> -.iov_base = *out_cursor, >> .iov_len = sizeof(ctrl) + data_size, >> }; >> struct iovec in = { >> -.iov_base = *in_cursor, >> .iov_len = sizeof(*s->status), >> }; >> ssize_t r; >> >> -assert(data_size < vhost_vdpa_net_cvq_cmd_page_len() - sizeof(ctrl)); >> +if (sizeof(ctrl) > left_bytes || data_size > left_bytes - sizeof(ctrl) >> || > > I'm ok with this code, but maybe we can simplify the code if we use > two struct iovec as cursors instead of a void **? I think functions > like iov_size and iov_copy already take care of a few checks here. Hi Eugenio, Thanks for the explanation, I will refactor the patch according to your suggestion! > > Apart from that it would be great to merge this call to > vhost_vdpa_net_svq_flush, but I find it very hard to do unless we > scatter it through all callers of vhost_vdpa_net_load_cmd. Yes, I agree with you. Maybe we can consider refactoring like this in the future if needed. > > Apart from the minor comments I think the series is great, thanks! Thanks for your review:)! > >> +vhost_vdpa_net_svq_available_slots(s) < 2) { >> +/* >> + * It is time to flush all pending control commands if SVQ is full >> + * or control commands shadow buffers are full. >> + * >> + * We can poll here since we've had BQL from the time >> + * we sent the descriptor. >> + */ >> +r = vhost_vdpa_net_svq_flush(s, *in_cursor - (void *)s->status); >> +if (unlikely(r < 0)) { >> +return r; >> +} >> + >> +*out_cursor = s->cvq_cmd_out_buffer; >> +*in_cursor = s->status; >> +left_bytes = vhost_vdpa_net_cvq_cmd_page_len(); >> +} >> + >> +out.iov_base = *out_cursor; >> +in.iov_base = *in_cursor;
Re: [PATCH v4 3/8] vhost: Expose vhost_svq_available_slots()
在 2023/10/4 01:44, Eugenio Perez Martin 写道: > On Tue, Aug 29, 2023 at 7:55 AM Hawkins Jiawei wrote: >> >> Next patches in this series will delay the polling >> and checking of buffers until either the SVQ is >> full or control commands shadow buffers are full, >> no longer perform an immediate poll and check of >> the device's used buffers for each CVQ state load command. >> >> To achieve this, this patch exposes >> vhost_svq_available_slots() and introduces a helper function, >> allowing QEMU to know whether the SVQ is full. >> >> Signed-off-by: Hawkins Jiawei >> Acked-by: Eugenio Pérez >> --- >> hw/virtio/vhost-shadow-virtqueue.c | 2 +- >> hw/virtio/vhost-shadow-virtqueue.h | 1 + >> net/vhost-vdpa.c | 9 + >> 3 files changed, 11 insertions(+), 1 deletion(-) >> >> diff --git a/hw/virtio/vhost-shadow-virtqueue.c >> b/hw/virtio/vhost-shadow-virtqueue.c >> index e731b1d2ea..fc5f408f77 100644 >> --- a/hw/virtio/vhost-shadow-virtqueue.c >> +++ b/hw/virtio/vhost-shadow-virtqueue.c >> @@ -66,7 +66,7 @@ bool vhost_svq_valid_features(uint64_t features, Error >> **errp) >>* >>* @svq: The svq >>*/ >> -static uint16_t vhost_svq_available_slots(const VhostShadowVirtqueue *svq) >> +uint16_t vhost_svq_available_slots(const VhostShadowVirtqueue *svq) >> { >> return svq->num_free; >> } >> diff --git a/hw/virtio/vhost-shadow-virtqueue.h >> b/hw/virtio/vhost-shadow-virtqueue.h >> index 5bce67837b..19c842a15b 100644 >> --- a/hw/virtio/vhost-shadow-virtqueue.h >> +++ b/hw/virtio/vhost-shadow-virtqueue.h >> @@ -114,6 +114,7 @@ typedef struct VhostShadowVirtqueue { >> >> bool vhost_svq_valid_features(uint64_t features, Error **errp); >> >> +uint16_t vhost_svq_available_slots(const VhostShadowVirtqueue *svq); >> void vhost_svq_push_elem(VhostShadowVirtqueue *svq, >>const VirtQueueElement *elem, uint32_t len); >> int vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg, > > I think it is ok to split this export in its own patch. If you decide > to do it that way, you can add my Acked-by. I will split this in its own patch, thanks for your suggestion! > >> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c >> index a875767ee9..e6342b213f 100644 >> --- a/net/vhost-vdpa.c >> +++ b/net/vhost-vdpa.c >> @@ -620,6 +620,13 @@ static ssize_t vhost_vdpa_net_cvq_add(VhostVDPAState *s, >> return vhost_svq_poll(svq, 1); >> } >> >> +/* Convenience wrapper to get number of available SVQ descriptors */ >> +static uint16_t vhost_vdpa_net_svq_available_slots(VhostVDPAState *s) >> +{ >> +VhostShadowVirtqueue *svq = g_ptr_array_index(s->vhost_vdpa.shadow_vqs, >> 0); > > This is not really generic enough for all VhostVDPAState, as dataplane > ones have two svqs. > > I think the best is to just inline the function in the caller, as > there is only one, isn't it? If not, would it work to just replace > _net_ by _cvq_ or similar? > Yes, there should be only one user for this function, I will inline the function in the caller. >> +return vhost_svq_available_slots(svq); >> +} >> + >> static ssize_t vhost_vdpa_net_load_cmd(VhostVDPAState *s, uint8_t class, >> uint8_t cmd, const struct iovec >> *data_sg, >> size_t data_num) >> @@ -640,6 +647,8 @@ static ssize_t vhost_vdpa_net_load_cmd(VhostVDPAState >> *s, uint8_t class, >> }; >> >> assert(data_size < vhost_vdpa_net_cvq_cmd_page_len() - sizeof(ctrl)); >> +/* Each CVQ command has one out descriptor and one in descriptor */ >> +assert(vhost_vdpa_net_svq_available_slots(s) >= 2); >> > > I think we should remove this assertion. By the end of the series > there is an "if" checks explicitly for the opposite condition, and > flushing the queue in that case, so the code can never reach it. > Yes, you are right. I will remove this assertion. Thanks! >> /* pack the CVQ command header */ >> memcpy(s->cvq_cmd_out_buffer, &ctrl, sizeof(ctrl)); >> -- >> 2.25.1 >> >
Re: [PATCH v4 4/8] vdpa: Avoid using vhost_vdpa_net_load_*() outside vhost_vdpa_net_load()
在 2023/10/4 01:48, Eugenio Perez Martin 写道: > On Tue, Aug 29, 2023 at 7:55 AM Hawkins Jiawei wrote: >> >> Next patches in this series will refactor vhost_vdpa_net_load_cmd() >> to iterate through the control commands shadow buffers, allowing QEMU >> to send CVQ state load commands in parallel at device startup. >> >> Considering that QEMU always forwards the CVQ command serialized >> outside of vhost_vdpa_net_load(), it is more elegant to send the >> CVQ commands directly without invoking vhost_vdpa_net_load_*() helpers. >> >> Signed-off-by: Hawkins Jiawei >> --- >> v4: >>- pack CVQ command by iov_from_buf() instead of accessing >> `out` directly suggested by Eugenio >> >> v3: >> https://lore.kernel.org/all/428a8fac2a29b37757fa15ca747be93c0226cb1f.1689748694.git.yin31...@gmail.com/ >> >> net/vhost-vdpa.c | 16 +--- >> 1 file changed, 13 insertions(+), 3 deletions(-) >> >> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c >> index e6342b213f..7c67063469 100644 >> --- a/net/vhost-vdpa.c >> +++ b/net/vhost-vdpa.c >> @@ -1097,12 +1097,14 @@ static NetClientInfo net_vhost_vdpa_cvq_info = { >>*/ >> static int vhost_vdpa_net_excessive_mac_filter_cvq_add(VhostVDPAState *s, >> VirtQueueElement >> *elem, >> - struct iovec *out) >> + struct iovec *out, >> + const struct iovec >> *in) >> { >> struct virtio_net_ctrl_mac mac_data, *mac_ptr; >> struct virtio_net_ctrl_hdr *hdr_ptr; >> uint32_t cursor; >> ssize_t r; >> +uint8_t on = 1; >> >> /* parse the non-multicast MAC address entries from CVQ command */ >> cursor = sizeof(*hdr_ptr); >> @@ -1150,7 +1152,15 @@ static int >> vhost_vdpa_net_excessive_mac_filter_cvq_add(VhostVDPAState *s, >>* filter table to the vdpa device, it should send the >>* VIRTIO_NET_CTRL_RX_PROMISC CVQ command to enable promiscuous mode >>*/ >> -r = vhost_vdpa_net_load_rx_mode(s, VIRTIO_NET_CTRL_RX_PROMISC, 1); >> +cursor = 0; > > I think this is redundant, as "cursor" is not used by the new code and > it is already set to 0 before its usage, isn't it? > You are right, I will remove this code in the v5 patch. >> +hdr_ptr = out->iov_base; >> +out->iov_len = sizeof(*hdr_ptr) + sizeof(on); >> +assert(out->iov_len < vhost_vdpa_net_cvq_cmd_page_len()); >> + > > I think we can assume this assertion is never hit, as out->iov_len is > controlled by this function. > Thanks for your suggestion, I will remove this assertion. Thanks! > Apart from these nits, > > Acked-by: Eugenio Pérez > >> +hdr_ptr->class = VIRTIO_NET_CTRL_RX; >> +hdr_ptr->cmd = VIRTIO_NET_CTRL_RX_PROMISC; >> +iov_from_buf(out, 1, sizeof(*hdr_ptr), &on, sizeof(on)); >> +r = vhost_vdpa_net_cvq_add(s, out, 1, in, 1); >> if (unlikely(r < 0)) { >> return r; >> } >> @@ -1268,7 +1278,7 @@ static int >> vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq, >>* the CVQ command direclty. >>*/ >> dev_written = vhost_vdpa_net_excessive_mac_filter_cvq_add(s, elem, >> - &out); >> +&out, &vdpa_in); >> if (unlikely(dev_written < 0)) { >> goto out; >> } >> -- >> 2.25.1 >> >
Re: [PATCH v4 7/8] vdpa: Introduce cursors to vhost_vdpa_net_loadx()
在 2023/10/4 15:21, Eugenio Perez Martin 写道: > On Tue, Aug 29, 2023 at 7:55 AM Hawkins Jiawei wrote: >> >> This patch introduces two new arugments, `out_cursor` >> and `in_cursor`, to vhost_vdpa_net_loadx(). Addtionally, >> it includes a helper function >> vhost_vdpa_net_load_cursor_reset() for resetting these >> cursors. >> >> Furthermore, this patch refactors vhost_vdpa_net_load_cmd() >> so that vhost_vdpa_net_load_cmd() prepares buffers >> for the device using the cursors arguments, instead >> of directly accesses `s->cvq_cmd_out_buffer` and >> `s->status` fields. >> >> By making these change, next patches in this series >> can refactor vhost_vdpa_net_load_cmd() directly to >> iterate through the control commands shadow buffers, >> allowing QEMU to send CVQ state load commands in parallel >> at device startup. >> >> Signed-off-by: Hawkins Jiawei >> --- >> v4: >>- use `struct iovec` instead of `void **` as cursor >> suggested by Eugenio >>- add vhost_vdpa_net_load_cursor_reset() helper function >> to reset the cursors >>- refactor vhost_vdpa_net_load_cmd() to prepare buffers >> by iov_copy() instead of accessing `in` and `out` directly >> suggested by Eugenio >> >> v3: >> https://lore.kernel.org/all/bf390934673f2b613359ea9d7ac6c89199c31384.1689748694.git.yin31...@gmail.com/ >> >> net/vhost-vdpa.c | 114 --- >> 1 file changed, 77 insertions(+), 37 deletions(-) >> >> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c >> index d9b8b3cf6c..a71e8c9090 100644 >> --- a/net/vhost-vdpa.c >> +++ b/net/vhost-vdpa.c >> @@ -633,7 +633,22 @@ static uint16_t >> vhost_vdpa_net_svq_available_slots(VhostVDPAState *s) >> return vhost_svq_available_slots(svq); >> } >> >> -static ssize_t vhost_vdpa_net_load_cmd(VhostVDPAState *s, uint8_t class, >> +static void vhost_vdpa_net_load_cursor_reset(VhostVDPAState *s, >> + struct iovec *out_cursor, >> + struct iovec *in_cursor) >> +{ >> +/* reset the cursor of the output buffer for the device */ >> +out_cursor->iov_base = s->cvq_cmd_out_buffer; >> +out_cursor->iov_len = vhost_vdpa_net_cvq_cmd_page_len(); >> + >> +/* reset the cursor of the in buffer for the device */ >> +in_cursor->iov_base = s->status; >> +in_cursor->iov_len = vhost_vdpa_net_cvq_cmd_page_len(); >> +} >> + >> +static ssize_t vhost_vdpa_net_load_cmd(VhostVDPAState *s, >> + struct iovec *out_cursor, >> + struct iovec *in_cursor, uint8_t >> class, >> uint8_t cmd, const struct iovec >> *data_sg, >> size_t data_num) >> { >> @@ -641,28 +656,25 @@ static ssize_t vhost_vdpa_net_load_cmd(VhostVDPAState >> *s, uint8_t class, >> .class = class, >> .cmd = cmd, >> }; >> -size_t data_size = iov_size(data_sg, data_num); >> -/* Buffers for the device */ >> -const struct iovec out = { >> -.iov_base = s->cvq_cmd_out_buffer, >> -.iov_len = sizeof(ctrl) + data_size, >> -}; >> -const struct iovec in = { >> -.iov_base = s->status, >> -.iov_len = sizeof(*s->status), >> -}; >> +size_t data_size = iov_size(data_sg, data_num), >> + cmd_size = sizeof(ctrl) + data_size; >> +struct iovec out, in; >> ssize_t r; >> >> assert(data_size < vhost_vdpa_net_cvq_cmd_page_len() - sizeof(ctrl)); >> /* Each CVQ command has one out descriptor and one in descriptor */ >> assert(vhost_vdpa_net_svq_available_slots(s) >= 2); >> >> -/* pack the CVQ command header */ >> -memcpy(s->cvq_cmd_out_buffer, &ctrl, sizeof(ctrl)); >> +/* Prepare the buffer for out descriptor for the device */ >> +iov_copy(&out, 1, out_cursor, 1, 0, cmd_size); > > I may be missing something here, but cmd_size should be the bytes > available in "out", so we don't overrun it. > >> +/* Prepare the buffer for in descriptor for the device. */ >> +iov_copy(&in, 1, in_cursor, 1, 0, sizeof(*s->status)); >> > > Same here, although it is impossible for the moment to overrun it as > all cmds only return one byte. > Here we just use iov_co
Re: [PATCH v4 8/8] vdpa: Send cvq state load commands in parallel
在 2023/10/4 15:33, Eugenio Perez Martin 写道: > On Tue, Aug 29, 2023 at 7:55 AM Hawkins Jiawei wrote: >> >> This patch enables sending CVQ state load commands >> in parallel at device startup by following steps: >> >>* Refactor vhost_vdpa_net_load_cmd() to iterate through >> the control commands shadow buffers. This allows different >> CVQ state load commands to use their own unique buffers. >> >>* Delay the polling and checking of buffers until either >> the SVQ is full or control commands shadow buffers are full. >> >> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1578 >> Signed-off-by: Hawkins Jiawei >> --- >> v4: >>- refactor argument `cmds_in_flight` to `len` for >> vhost_vdpa_net_svq_full() >>- check the return value of vhost_vdpa_net_svq_poll() >> in vhost_vdpa_net_svq_flush() suggested by Eugenio >>- use iov_size(), vhost_vdpa_net_load_cursor_reset() >> and iov_discard_front() to update the cursors instead of >> accessing it directly according to Eugenio >> >> v3: >> https://lore.kernel.org/all/3a002790e6c880af928c6470ecbf03e7c65a68bb.1689748694.git.yin31...@gmail.com/ >> >> net/vhost-vdpa.c | 155 +-- >> 1 file changed, 97 insertions(+), 58 deletions(-) >> >> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c >> index a71e8c9090..818464b702 100644 >> --- a/net/vhost-vdpa.c >> +++ b/net/vhost-vdpa.c >> @@ -646,6 +646,31 @@ static void >> vhost_vdpa_net_load_cursor_reset(VhostVDPAState *s, >> in_cursor->iov_len = vhost_vdpa_net_cvq_cmd_page_len(); >> } >> >> +/* >> + * Poll SVQ for multiple pending control commands and check the device's >> ack. >> + * >> + * Caller should hold the BQL when invoking this function. >> + * >> + * @s: The VhostVDPAState >> + * @len: The length of the pending status shadow buffer >> + */ >> +static ssize_t vhost_vdpa_net_svq_flush(VhostVDPAState *s, size_t len) >> +{ >> +/* Device uses a one-byte length ack for each control command */ >> +ssize_t dev_written = vhost_vdpa_net_svq_poll(s, len); >> +if (unlikely(dev_written != len)) { >> +return -EIO; >> +} >> + >> +/* check the device's ack */ >> +for (int i = 0; i < len; ++i) { >> +if (s->status[i] != VIRTIO_NET_OK) { >> +return -EIO; >> +} >> +} >> +return 0; >> +} >> + >> static ssize_t vhost_vdpa_net_load_cmd(VhostVDPAState *s, >> struct iovec *out_cursor, >> struct iovec *in_cursor, uint8_t >> class, >> @@ -660,10 +685,30 @@ static ssize_t vhost_vdpa_net_load_cmd(VhostVDPAState >> *s, >> cmd_size = sizeof(ctrl) + data_size; >> struct iovec out, in; >> ssize_t r; >> +unsigned dummy_cursor_iov_cnt; >> >> assert(data_size < vhost_vdpa_net_cvq_cmd_page_len() - sizeof(ctrl)); >> +if (vhost_vdpa_net_svq_available_slots(s) < 2 || >> +iov_size(out_cursor, 1) < cmd_size) { >> +/* >> + * It is time to flush all pending control commands if SVQ is full >> + * or control commands shadow buffers are full. >> + * >> + * We can poll here since we've had BQL from the time >> + * we sent the descriptor. >> + */ >> +r = vhost_vdpa_net_svq_flush(s, in_cursor->iov_base - >> + (void *)s->status); >> +if (unlikely(r < 0)) { >> +return r; >> +} >> + >> +vhost_vdpa_net_load_cursor_reset(s, out_cursor, in_cursor); >> +} >> + > > It would be great to merge this flush with the one at > vhost_vdpa_net_load. We would need to return ENOSPC or similar and > handle it there. > > But it would make it more difficult to iterate through the loading of > the different parameters, so I think it can be done on top. > Hi Eugenio, Please forgive my poor English, so do you mean the approach in my patch is acceptable for you? >> /* Each CVQ command has one out descriptor and one in descriptor */ >> assert(vhost_vdpa_net_svq_available_slots(s) >= 2); >> +assert(iov_size(out_cursor, 1) >= cmd_size); >> > > Same here, I think we can avoid the assertion, right? You are right, I will remove this assertion. Thanks! > > Apart from that, > > Acked-by:
Re: [PATCH v3 0/3] vdpa: Return -EIO if device ack is VIRTIO_NET_ERR
On 2023/7/5 15:59, Lei Yang wrote: > Hello Hawkins > > QE can help test this series before it is merged into master, I would > like to know what test steps can cover this series related scenario? > Hi, I would like to suggest the following steps to test this patch series: 1. Modify the QEMU source code to make the device return a VIRTIO_NET_ERR for the CVQ command. Please apply the patch provided below: diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c index 373609216f..58ade6d4e0 100644 --- a/net/vhost-vdpa.c +++ b/net/vhost-vdpa.c @@ -642,7 +642,7 @@ static int vhost_vdpa_net_load_mac(VhostVDPAState *s, const VirtIONet *n) if (virtio_vdev_has_feature(&n->parent_obj, VIRTIO_NET_F_CTRL_MAC_ADDR)) { ssize_t dev_written = vhost_vdpa_net_load_cmd(s, VIRTIO_NET_CTRL_MAC, VIRTIO_NET_CTRL_MAC_ADDR_SET, - n->mac, sizeof(n->mac)); + n->mac, sizeof(n->mac) - 1); if (unlikely(dev_written < 0)) { return dev_written; } 2. Start QEMU with the vdpa device in default state. Without the patch series, QEMU should not trigger any errors or warnings. With the series applied, QEMU should trigger the warning like "qemu-system-x86_64: unable to start vhost net: 5: falling back on userspace virtio". Thanks! > Thanks > Lei > > On Tue, Jul 4, 2023 at 11:36 AM Hawkins Jiawei wrote: >> >> According to VirtIO standard, "The class, command and >> command-specific-data are set by the driver, >> and the device sets the ack byte. >> There is little it can do except issue a diagnostic >> if ack is not VIRTIO_NET_OK." >> >> Therefore, QEMU should stop sending the queued SVQ commands and >> cancel the device startup if the device's ack is not VIRTIO_NET_OK. >> >> Yet the problem is that, vhost_vdpa_net_load_x() returns 1 based on >> `*s->status != VIRTIO_NET_OK` when the device's ack is VIRTIO_NET_ERR. >> As a result, net->nc->info->load() also returns 1, this makes >> vhost_net_start_one() incorrectly assume the device state is >> successfully loaded by vhost_vdpa_net_load() and return 0, instead of >> goto `fail` label to cancel the device startup, as vhost_net_start_one() >> only cancels the device startup when net->nc->info->load() returns a >> negative value. >> >> This patchset fixes this problem by returning -EIO when the device's >> ack is not VIRTIO_NET_OK. >> >> Changelog >> = >> v3: >> - split the fixes suggested by Eugenio >> - return -EIO suggested by Michael >> >> v2: >> https://lore.kernel.org/all/69010e9ebb5e3729aef595ed92840f43e48e53e5.1687875592.git.yin31...@gmail.com/ >> - fix the same bug in vhost_vdpa_net_load_offloads() >> >> v1: https://lore.kernel.org/all/cover.1686746406.git.yin31...@gmail.com/ >> >> Hawkins Jiawei (3): >>vdpa: Return -EIO if device ack is VIRTIO_NET_ERR in _load_mac() >>vdpa: Return -EIO if device ack is VIRTIO_NET_ERR in _load_mq() >>vdpa: Return -EIO if device ack is VIRTIO_NET_ERR in _load_offloads() >> >> net/vhost-vdpa.c | 15 +++ >> 1 file changed, 11 insertions(+), 4 deletions(-) >> >> -- >> 2.25.1 >> >> >
[PATCH v3 6/7] vdpa: Avoid forwarding large CVQ command failures
Due to the size limitation of the out buffer sent to the vdpa device, which is determined by vhost_vdpa_net_cvq_cmd_len(), excessive CVQ command is truncated in QEMU. As a result, the vdpa device rejects this flawd CVQ command. However, the problem is that, the VIRTIO_NET_CTRL_MAC_TABLE_SET CVQ command has a variable length, which may exceed vhost_vdpa_net_cvq_cmd_len() if the guest sets more than `MAC_TABLE_ENTRIES` MAC addresses for the filter table. This patch solves this problem by following steps: * Increase the out buffer size to vhost_vdpa_net_cvq_cmd_page_len(), which represents the size of the buffer that is allocated and mmaped. This ensures that everything works correctly as long as the guest sets fewer than `(vhost_vdpa_net_cvq_cmd_page_len() - sizeof(struct virtio_net_ctrl_hdr) - 2 * sizeof(struct virtio_net_ctrl_mac)) / ETH_ALEN` MAC addresses. Considering the highly unlikely scenario for the guest setting more than that number of MAC addresses for the filter table, this should work fine for the majority of cases. * If the CVQ command exceeds vhost_vdpa_net_cvq_cmd_page_len(), instead of directly sending this CVQ command, QEMU should send a VIRTIO_NET_CTRL_RX_PROMISC CVQ command to vdpa device. Addtionally, a fake VIRTIO_NET_CTRL_MAC_TABLE_SET command including (`MAC_TABLE_ENTRIES` + 1) non-multicast MAC addresses and (`MAC_TABLE_ENTRIES` + 1) multicast MAC addresses should be provided to the device model. By doing so, the vdpa device turns promiscuous mode on, aligning with the VirtIO standard. The device model marks `n->mac_table.uni_overflow` and `n->mac_table.multi_overflow`, which aligns with the state of the vdpa device. Note that the bug cannot be triggered at the moment, since VIRTIO_NET_F_CTRL_RX feature is not enabled for SVQ. Fixes: 7a7f87e94c ("vdpa: Move command buffers map to start of net device") Signed-off-by: Hawkins Jiawei --- net/vhost-vdpa.c | 162 ++- 1 file changed, 161 insertions(+), 1 deletion(-) diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c index a84eb088a0..a4ff6c52b7 100644 --- a/net/vhost-vdpa.c +++ b/net/vhost-vdpa.c @@ -916,6 +916,148 @@ static NetClientInfo net_vhost_vdpa_cvq_info = { .check_peer_type = vhost_vdpa_check_peer_type, }; +/* + * Forward the excessive VIRTIO_NET_CTRL_MAC_TABLE_SET CVQ command to + * vdpa device. + * + * Considering that QEMU cannot send the entire filter table to the + * vdpa device, it should send the VIRTIO_NET_CTRL_RX_PROMISC CVQ + * command to enable promiscuous mode to receive all packets, + * according to VirtIO standard, "Since there are no guarantees, + * it can use a hash filter or silently switch to allmulti or + * promiscuous mode if it is given too many addresses.". + * + * Since QEMU ignores MAC addresses beyond `MAC_TABLE_ENTRIES` and + * marks `n->mac_table.x_overflow` accordingly, it should have + * the same effect on the device model to receive + * (`MAC_TABLE_ENTRIES` + 1) or more non-multicast MAC addresses. + * The same applies to multicast MAC addresses. + * + * Therefore, QEMU can provide the device model with a fake + * VIRTIO_NET_CTRL_MAC_TABLE_SET command with (`MAC_TABLE_ENTRIES` + 1) + * non-multicast MAC addresses and (`MAC_TABLE_ENTRIES` + 1) multicast + * MAC addresses. This ensures that the device model marks + * `n->mac_table.uni_overflow` and `n->mac_table.multi_overflow`, + * allowing all packets to be received, which aligns with the + * state of the vdpa device. + */ +static int vhost_vdpa_net_excessive_mac_filter_cvq_add(VhostVDPAState *s, + VirtQueueElement *elem, + struct iovec *out) +{ +struct virtio_net_ctrl_mac mac_data, *mac_ptr; +struct virtio_net_ctrl_hdr *hdr_ptr; +uint32_t cursor; +ssize_t r; + +/* parse the non-multicast MAC address entries from CVQ command */ +cursor = sizeof(*hdr_ptr); +r = iov_to_buf(elem->out_sg, elem->out_num, cursor, + &mac_data, sizeof(mac_data)); +if (unlikely(r != sizeof(mac_data))) { +/* + * If the CVQ command is invalid, we should simulate the vdpa device + * to reject the VIRTIO_NET_CTRL_MAC_TABLE_SET CVQ command + */ +*s->status = VIRTIO_NET_ERR; +return sizeof(*s->status); +} +cursor += sizeof(mac_data) + le32_to_cpu(mac_data.entries) * ETH_ALEN; + +/* parse the multicast MAC address entries from CVQ command */ +r = iov_to_buf(elem->out_sg, elem->out_num, cursor, + &mac_data, sizeof(mac_data)); +if (r != sizeof(mac_data)) { +/* + * If the CVQ command is invalid, we should simulate the vdpa device + * to reject the VIRTIO_NET_CTRL_MAC_TABLE_SET CVQ command + */ +*s->status = VIRTIO_NET_ERR; +return sizeof(*s->status); +
[PATCH v3 2/7] vdpa: Restore MAC address filtering state
This patch refactors vhost_vdpa_net_load_mac() to restore the MAC address filtering state at device's startup. Signed-off-by: Hawkins Jiawei --- v3: - return early if mismatch the condition suggested by Eugenio v2: https://lore.kernel.org/all/2f2560f749186c0eb1055f9926f464587e419eeb.1688051252.git.yin31...@gmail.com/ - use iovec suggested by Eugenio - avoid sending CVQ command in default state v1: https://lore.kernel.org/all/00f72fe154a882fd6dc15bc39e3a1ac63f9dadce.1687402580.git.yin31...@gmail.com/ net/vhost-vdpa.c | 52 1 file changed, 52 insertions(+) diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c index 31ef6ad6ec..7189ccafaf 100644 --- a/net/vhost-vdpa.c +++ b/net/vhost-vdpa.c @@ -660,6 +660,58 @@ static int vhost_vdpa_net_load_mac(VhostVDPAState *s, const VirtIONet *n) } } +/* + * According to VirtIO standard, "The device MUST have an + * empty MAC filtering table on reset.". + * + * Therefore, there is no need to send this CVQ command if the + * driver also sets an empty MAC filter table, which aligns with + * the device's defaults. + * + * Note that the device's defaults can mismatch the driver's + * configuration only at live migration. + */ +if (!virtio_vdev_has_feature(&n->parent_obj, VIRTIO_NET_F_CTRL_RX) || +n->mac_table.in_use == 0) { +return 0; +} + +uint32_t uni_entries = n->mac_table.first_multi, + uni_macs_size = uni_entries * ETH_ALEN, + mul_entries = n->mac_table.in_use - uni_entries, + mul_macs_size = mul_entries * ETH_ALEN; +struct virtio_net_ctrl_mac uni = { +.entries = cpu_to_le32(uni_entries), +}; +struct virtio_net_ctrl_mac mul = { +.entries = cpu_to_le32(mul_entries), +}; +const struct iovec data[] = { +{ +.iov_base = &uni, +.iov_len = sizeof(uni), +}, { +.iov_base = n->mac_table.macs, +.iov_len = uni_macs_size, +}, { +.iov_base = &mul, +.iov_len = sizeof(mul), +}, { +.iov_base = &n->mac_table.macs[uni_macs_size], +.iov_len = mul_macs_size, +}, +}; +ssize_t dev_written = vhost_vdpa_net_load_cmd(s, +VIRTIO_NET_CTRL_MAC, +VIRTIO_NET_CTRL_MAC_TABLE_SET, +data, ARRAY_SIZE(data)); +if (unlikely(dev_written < 0)) { +return dev_written; +} +if (*s->status != VIRTIO_NET_OK) { +return -EIO; +} + return 0; } -- 2.25.1
[PATCH v3 3/7] vdpa: Restore packet receive filtering state relative with _F_CTRL_RX feature
This patch introduces vhost_vdpa_net_load_rx_mode() and vhost_vdpa_net_load_rx() to restore the packet receive filtering state in relation to VIRTIO_NET_F_CTRL_RX feature at device's startup. Signed-off-by: Hawkins Jiawei --- v3: - return early if mismatch the condition suggested by Eugenio - remove the `on` variable suggested by Eugenio v2: https://lore.kernel.org/all/d9d7641ef25d7a4477f8fc4df8cba026380dab76.1688051252.git.yin31...@gmail.com/ - avoid sending CVQ command in default state suggested by Eugenio v1: https://lore.kernel.org/all/86eeddcd6f6b04e5c1e44e901ddea3b1b8b6c183.1687402580.git.yin31...@gmail.com/ net/vhost-vdpa.c | 85 1 file changed, 85 insertions(+) diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c index 7189ccafaf..e80d4b4ef3 100644 --- a/net/vhost-vdpa.c +++ b/net/vhost-vdpa.c @@ -788,6 +788,87 @@ static int vhost_vdpa_net_load_offloads(VhostVDPAState *s, return 0; } +static int vhost_vdpa_net_load_rx_mode(VhostVDPAState *s, + uint8_t cmd, + uint8_t on) +{ +const struct iovec data = { +.iov_base = &on, +.iov_len = sizeof(on), +}; +return vhost_vdpa_net_load_cmd(s, VIRTIO_NET_CTRL_RX, + cmd, &data, 1); +} + +static int vhost_vdpa_net_load_rx(VhostVDPAState *s, + const VirtIONet *n) +{ +ssize_t dev_written; + +if (!virtio_vdev_has_feature(&n->parent_obj, VIRTIO_NET_F_CTRL_RX)) { +return 0; +} + +/* + * According to virtio_net_reset(), device turns promiscuous mode + * on by default. + * + * Addtionally, according to VirtIO standard, "Since there are + * no guarantees, it can use a hash filter or silently switch to + * allmulti or promiscuous mode if it is given too many addresses.". + * QEMU marks `n->mac_table.uni_overflow` if guest sets too many + * non-multicast MAC addresses, indicating that promiscuous mode + * should be enabled. + * + * Therefore, QEMU should only send this CVQ command if the + * `n->mac_table.uni_overflow` is not marked and `n->promisc` is off, + * which sets promiscuous mode on, different from the device's defaults. + * + * Note that the device's defaults can mismatch the driver's + * configuration only at live migration. + */ +if (!n->mac_table.uni_overflow && !n->promisc) { +dev_written = vhost_vdpa_net_load_rx_mode(s, +VIRTIO_NET_CTRL_RX_PROMISC, 0); +if (unlikely(dev_written < 0)) { +return dev_written; +} +if (*s->status != VIRTIO_NET_OK) { +return -EIO; +} +} + +/* + * According to virtio_net_reset(), device turns all-multicast mode + * off by default. + * + * According to VirtIO standard, "Since there are no guarantees, + * it can use a hash filter or silently switch to allmulti or + * promiscuous mode if it is given too many addresses.". QEMU marks + * `n->mac_table.multi_overflow` if guest sets too many + * non-multicast MAC addresses. + * + * Therefore, QEMU should only send this CVQ command if the + * `n->mac_table.multi_overflow` is marked or `n->allmulti` is on, + * which sets all-multicast mode on, different from the device's defaults. + * + * Note that the device's defaults can mismatch the driver's + * configuration only at live migration. + */ +if (n->mac_table.multi_overflow || n->allmulti) { +dev_written = vhost_vdpa_net_load_rx_mode(s, +VIRTIO_NET_CTRL_RX_ALLMULTI, 1); +if (unlikely(dev_written < 0)) { +return dev_written; +} +if (*s->status != VIRTIO_NET_OK) { +return -EIO; +} +} + +return 0; +} + static int vhost_vdpa_net_load(NetClientState *nc) { VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc); @@ -814,6 +895,10 @@ static int vhost_vdpa_net_load(NetClientState *nc) if (unlikely(r)) { return r; } +r = vhost_vdpa_net_load_rx(s, n); +if (unlikely(r)) { +return r; +} return 0; } -- 2.25.1
[PATCH v3 5/7] vdpa: Accessing CVQ header through its structure
We can access the CVQ header through `struct virtio_net_ctrl_hdr`, instead of accessing it through a `uint8_t` pointer, which improves the code's readability and maintainability. Signed-off-by: Hawkins Jiawei --- net/vhost-vdpa.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c index e80d4b4ef3..a84eb088a0 100644 --- a/net/vhost-vdpa.c +++ b/net/vhost-vdpa.c @@ -928,6 +928,7 @@ static int vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq, { VhostVDPAState *s = opaque; size_t in_len; +const struct virtio_net_ctrl_hdr *ctrl; virtio_net_ctrl_ack status = VIRTIO_NET_ERR; /* Out buffer sent to both the vdpa device and the device model */ struct iovec out = { @@ -943,7 +944,9 @@ static int vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq, out.iov_len = iov_to_buf(elem->out_sg, elem->out_num, 0, s->cvq_cmd_out_buffer, vhost_vdpa_net_cvq_cmd_len()); -if (*(uint8_t *)s->cvq_cmd_out_buffer == VIRTIO_NET_CTRL_ANNOUNCE) { + +ctrl = s->cvq_cmd_out_buffer; +if (ctrl->class == VIRTIO_NET_CTRL_ANNOUNCE) { /* * Guest announce capability is emulated by qemu, so don't forward to * the device. -- 2.25.1
[PATCH v3 4/7] vhost: Fix false positive out-of-bounds
QEMU uses vhost_svq_translate_addr() to translate addresses between the QEMU's virtual address and the SVQ IOVA. In order to validate this translation, QEMU checks whether the translated range falls within the mapped range. Yet the problem is that, the value of `needle_last`, which is calculated by `needle.translated_addr + iovec[i].iov_len`, should represent the exclusive boundary of the translated range, rather than the last inclusive addresses of the range. Consequently, QEMU fails the check when the translated range matches the size of the mapped range. This patch solves this problem by fixing the `needle_last` value to the last inclusive address of the translated range. Note that this bug cannot be triggered at the moment, because QEMU is unable to translate such a big range due to the truncation of the CVQ command in vhost_vdpa_net_handle_ctrl_avail(). Fixes: 34e3c94eda ("vdpa: Add custom IOTLB translations to SVQ") Signed-off-by: Hawkins Jiawei --- hw/virtio/vhost-shadow-virtqueue.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c index 1b1d85306c..49e5aed931 100644 --- a/hw/virtio/vhost-shadow-virtqueue.c +++ b/hw/virtio/vhost-shadow-virtqueue.c @@ -111,7 +111,7 @@ static bool vhost_svq_translate_addr(const VhostShadowVirtqueue *svq, addrs[i] = map->iova + off; needle_last = int128_add(int128_make64(needle.translated_addr), - int128_make64(iovec[i].iov_len)); + int128_makes64(iovec[i].iov_len - 1)); map_last = int128_make64(map->translated_addr + map->size); if (unlikely(int128_gt(needle_last, map_last))) { qemu_log_mask(LOG_GUEST_ERROR, -- 2.25.1
[PATCH v3 7/7] vdpa: Allow VIRTIO_NET_F_CTRL_RX in SVQ
Enable SVQ with VIRTIO_NET_F_CTRL_RX feature. Signed-off-by: Hawkins Jiawei Acked-by: Eugenio Pérez --- net/vhost-vdpa.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c index a4ff6c52b7..0994836f8c 100644 --- a/net/vhost-vdpa.c +++ b/net/vhost-vdpa.c @@ -104,6 +104,7 @@ static const uint64_t vdpa_svq_device_features = BIT_ULL(VIRTIO_NET_F_MRG_RXBUF) | BIT_ULL(VIRTIO_NET_F_STATUS) | BIT_ULL(VIRTIO_NET_F_CTRL_VQ) | +BIT_ULL(VIRTIO_NET_F_CTRL_RX) | BIT_ULL(VIRTIO_NET_F_MQ) | BIT_ULL(VIRTIO_F_ANY_LAYOUT) | BIT_ULL(VIRTIO_NET_F_CTRL_MAC_ADDR) | -- 2.25.1
[PATCH v3 1/7] vdpa: Use iovec for vhost_vdpa_net_load_cmd()
According to VirtIO standard, "The driver MUST follow the VIRTIO_NET_CTRL_MAC_TABLE_SET command by a le32 number, followed by that number of non-multicast MAC addresses, followed by another le32 number, followed by that number of multicast addresses." Considering that these data is not stored in contiguous memory, this patch refactors vhost_vdpa_net_load_cmd() to accept scattered data, eliminating the need for an addtional data copy or packing the data into s->cvq_cmd_out_buffer outside of vhost_vdpa_net_load_cmd(). Signed-off-by: Hawkins Jiawei --- v3: - rename argument name to `data_sg` and `data_num` - use iov_to_buf() suggested by Eugenio v2: https://lore.kernel.org/all/6d3dc0fc076564a03501e222ef1102a6a7a643af.1688051252.git.yin31...@gmail.com/ - refactor vhost_vdpa_load_cmd() to accept iovec suggested by Eugenio net/vhost-vdpa.c | 33 + 1 file changed, 25 insertions(+), 8 deletions(-) diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c index 373609216f..31ef6ad6ec 100644 --- a/net/vhost-vdpa.c +++ b/net/vhost-vdpa.c @@ -620,29 +620,38 @@ static ssize_t vhost_vdpa_net_cvq_add(VhostVDPAState *s, size_t out_len, } static ssize_t vhost_vdpa_net_load_cmd(VhostVDPAState *s, uint8_t class, - uint8_t cmd, const void *data, - size_t data_size) + uint8_t cmd, const struct iovec *data_sg, + size_t data_num) { const struct virtio_net_ctrl_hdr ctrl = { .class = class, .cmd = cmd, }; +size_t data_size = iov_size(data_sg, data_num); assert(data_size < vhost_vdpa_net_cvq_cmd_page_len() - sizeof(ctrl)); +/* pack the CVQ command header */ memcpy(s->cvq_cmd_out_buffer, &ctrl, sizeof(ctrl)); -memcpy(s->cvq_cmd_out_buffer + sizeof(ctrl), data, data_size); -return vhost_vdpa_net_cvq_add(s, sizeof(ctrl) + data_size, +/* pack the CVQ command command-specific-data */ +iov_to_buf(data_sg, data_num, 0, + s->cvq_cmd_out_buffer + sizeof(ctrl), data_size); + +return vhost_vdpa_net_cvq_add(s, data_size + sizeof(ctrl), sizeof(virtio_net_ctrl_ack)); } static int vhost_vdpa_net_load_mac(VhostVDPAState *s, const VirtIONet *n) { if (virtio_vdev_has_feature(&n->parent_obj, VIRTIO_NET_F_CTRL_MAC_ADDR)) { +const struct iovec data = { +.iov_base = (void *)n->mac, +.iov_len = sizeof(n->mac), +}; ssize_t dev_written = vhost_vdpa_net_load_cmd(s, VIRTIO_NET_CTRL_MAC, VIRTIO_NET_CTRL_MAC_ADDR_SET, - n->mac, sizeof(n->mac)); + &data, 1); if (unlikely(dev_written < 0)) { return dev_written; } @@ -665,9 +674,13 @@ static int vhost_vdpa_net_load_mq(VhostVDPAState *s, } mq.virtqueue_pairs = cpu_to_le16(n->curr_queue_pairs); +const struct iovec data = { +.iov_base = &mq, +.iov_len = sizeof(mq), +}; dev_written = vhost_vdpa_net_load_cmd(s, VIRTIO_NET_CTRL_MQ, - VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET, &mq, - sizeof(mq)); + VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET, + &data, 1); if (unlikely(dev_written < 0)) { return dev_written; } @@ -706,9 +719,13 @@ static int vhost_vdpa_net_load_offloads(VhostVDPAState *s, } offloads = cpu_to_le64(n->curr_guest_offloads); +const struct iovec data = { +.iov_base = &offloads, +.iov_len = sizeof(offloads), +}; dev_written = vhost_vdpa_net_load_cmd(s, VIRTIO_NET_CTRL_GUEST_OFFLOADS, VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET, - &offloads, sizeof(offloads)); + &data, 1); if (unlikely(dev_written < 0)) { return dev_written; } -- 2.25.1
[PATCH v3 0/7] Vhost-vdpa Shadow Virtqueue _F_CTRL_RX commands support
dpa: Restore packet receive filtering state relative with _F_CTRL_RX feature" - remove the `on` variable suggested by Eugenio in patch 3 "vdpa: Restore packet receive filtering state relative with _F_CTRL_RX feature" - fix possible false positive out-of-bounds in patch 4 "vhost: Fix false positive out-of-bounds" - avoid forwarding large CVQ command failures suggested by Eugenio by patch 5 "vdpa: Accessing CVQ header through its structure" and patch 6 "vdpa: Avoid forwarding large CVQ command failures" v2: https://lore.kernel.org/all/cover.1688051252.git.yin31...@gmail.com/ - refactor vhost_vdpa_net_load_cmd() to accept iovec suggested by Eugenio - avoid sending CVQ command in default state suggested by Eugenio v1: https://lists.nongnu.org/archive/html/qemu-devel/2023-06/msg04423.html Hawkins Jiawei (7): vdpa: Use iovec for vhost_vdpa_net_load_cmd() vdpa: Restore MAC address filtering state vdpa: Restore packet receive filtering state relative with _F_CTRL_RX feature vhost: Fix false positive out-of-bounds vdpa: Accessing CVQ header through its structure vdpa: Avoid forwarding large CVQ command failures vdpa: Allow VIRTIO_NET_F_CTRL_RX in SVQ hw/virtio/vhost-shadow-virtqueue.c | 2 +- net/vhost-vdpa.c | 338 - 2 files changed, 329 insertions(+), 11 deletions(-) -- 2.25.1
[PATCH 0/1] vdpa: Fix possible use-after-free for VirtQueueElement
This patch fixes the problem that vhost_vdpa_net_handle_ctrl_avail() mistakenly frees the `elem`, even if it fails to forward the CVQ command to vdpa device. This can result in a use-after-free TestStep 1. test the patch using vp-vdpa device - For L0 guest, boot QEMU with virtio-net-pci net device with `ctrl_vq` feature on, something like: -device virtio-net-pci,rx_queue_size=256,tx_queue_size=256, iommu_platform=on,ctrl_vq=on,... - For L1 guest, apply the patch series, then apply an addtional patch to make the vhost_vdpa_net_handle_ctrl_avail() fails to process the CVQ command as below: diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c index d8f37694ac..1f22355a41 100644 --- a/net/vhost-vdpa.c +++ b/net/vhost-vdpa.c @@ -797,7 +797,8 @@ static int vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq, dev_written = sizeof(status); *s->status = VIRTIO_NET_OK; } else { -dev_written = vhost_vdpa_net_cvq_add(s, out.iov_len, sizeof(status)); +//dev_written = vhost_vdpa_net_cvq_add(s, out.iov_len, sizeof(status)); +dev_written = -EINVAL; if (unlikely(dev_written < 0)) { goto out; } start QEMU with vdpa device with svq mode and enable the `ctrl_vq` feature on, something like: -netdev type=vhost-vdpa,x-svq=true,... -device virtio-net-pci,ctrl_vq=on,... With this series, QEMU should not trigger any error or warning. Without this series, QEMU should fail with "free(): double free detected in tcache 2". Hawkins Jiawei (1): vdpa: Fix possible use-after-free for VirtQueueElement net/vhost-vdpa.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) -- 2.25.1
[PATCH 1/1] vdpa: Fix possible use-after-free for VirtQueueElement
QEMU uses vhost_handle_guest_kick() to forward guest's available buffers to the vdpa device in SVQ avail ring. In vhost_handle_guest_kick(), a `g_autofree` `elem` is used to iterate through the available VirtQueueElements. This `elem` is then passed to `svq->ops->avail_handler`, specifically to the vhost_vdpa_net_handle_ctrl_avail(). If this handler fails to process the CVQ command, vhost_handle_guest_kick() regains ownership of the `elem`, and either frees it or requeues it. Yet the problem is that, vhost_vdpa_net_handle_ctrl_avail() mistakenly frees the `elem`, even if it fails to forward the CVQ command to vdpa device. This can result in a use-after-free for the `elem` in vhost_handle_guest_kick(). This patch solves this problem by refactoring vhost_vdpa_net_handle_ctrl_avail() to only freeing the `elem` if it owns it. Fixes: bd907ae4b0 ("vdpa: manual forward CVQ buffers") Signed-off-by: Hawkins Jiawei --- net/vhost-vdpa.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c index 373609216f..d8f37694ac 100644 --- a/net/vhost-vdpa.c +++ b/net/vhost-vdpa.c @@ -825,7 +825,16 @@ out: error_report("Bad device CVQ written length"); } vhost_svq_push_elem(svq, elem, MIN(in_len, sizeof(status))); -g_free(elem); +/* + * `elem` belongs to vhost_vdpa_net_handle_ctrl_avail() only when + * the function successfully forwards the CVQ command, indicated + * by a non-negative value of `dev_written`. Otherwise, it still + * belongs to SVQ. + * This function should only free the `elem` when it owns. + */ +if (dev_written >= 0) { +g_free(elem); +} return dev_written < 0 ? dev_written : 0; } -- 2.25.1
[PATCH v3 0/2] Vhost-vdpa Shadow Virtqueue _F_CTRL_RX_EXTRA commands support
This series enables shadowed CVQ to intercept rx commands related to VIRTIO_NET_F_CTRL_RX_EXTRA feature through shadowed CVQ, update the virtio NIC device model so qemu send it in a migration, and the restore of that rx state in the destination. To test this patch series, one should modify the `n->parent_obj.guest_features` value in vhost_vdpa_net_load_rx() using gdb, as the linux virtio-net driver does not currently support the VIRTIO_NET_F_CTRL_RX_EXTRA feature. Note that this patch should be based on [1] patch "Vhost-vdpa Shadow Virtqueue _F_CTRL_RX commands support" [1]. https://lore.kernel.org/all/cover.1688743107.git.yin31...@gmail.com/ TestStep 1. test the patch series using vp-vdpa device - For L0 guest, boot QEMU with virtio-net-pci net device with `ctrl_vq`, `ctrl_rx` and `ctrl_rx_extra` feature on, something like: -device virtio-net-pci,rx_queue_size=256,tx_queue_size=256, iommu_platform=on,ctrl_vq=on,ctrl_rx=on,ctrl_rx_extra=on... - For L1 guest, apply the patch series and compile the code, start QEMU with vdpa device with svq mode and enable the `ctrl_vq`, `ctrl_rx` and `ctrl_rx_extra` feature on, something like: -netdev type=vhost-vdpa,x-svq=true,... -device virtio-net-pci,ctrl_vq=on,ctrl_rx=on,ctrl_rx_extra=on... Use gdb to attach the VM and break at the net/vhost-vdpa.c:870. With this series, gdb can hit the breakpoint. Enable the VIRTIO_NET_F_CTRL_RX_EXTRA feature and enable the non-unicast mode by entering the following gdb commands: ```gdb set n->parent_obj.guest_features |= (1 << 20) set n->nouni = 1 c ``` QEMU should not trigger any errors or warnings. Without this series, QEMU should fail with "x-svq=true: vdpa svq does not work with features 0x10". ChangeLog = v3: - return early if mismatch the condition suggested by Eugenio in patch 1 "vdpa: Restore packet receive filtering state relative with _F_CTRL_RX_EXTRA feature" - remove the `on` variable suggested by Eugenio in patch 1 "vdpa: Restore packet receive filtering state relative with _F_CTRL_RX_EXTRA feature" v2: https://lore.kernel.org/all/cover.1688365324.git.yin31...@gmail.com/ - avoid sending CVQ command in default state suggested by Eugenio v1: https://lists.nongnu.org/archive/html/qemu-devel/2023-06/msg04956.html Hawkins Jiawei (2): vdpa: Restore packet receive filtering state relative with _F_CTRL_RX_EXTRA feature vdpa: Allow VIRTIO_NET_F_CTRL_RX_EXTRA in SVQ net/vhost-vdpa.c | 89 1 file changed, 89 insertions(+) -- 2.25.1
[PATCH v3 1/2] vdpa: Restore packet receive filtering state relative with _F_CTRL_RX_EXTRA feature
This patch refactors vhost_vdpa_net_load_rx() to restore the packet receive filtering state in relation to VIRTIO_NET_F_CTRL_RX_EXTRA feature at device's startup. Signed-off-by: Hawkins Jiawei --- v3: - return early if mismatch the condition suggested by Eugenio - remove the `on` variable suggested by Eugenio v2: https://lore.kernel.org/all/66ec4d7e3a680de645043d0331ab65940154f2b8.1688365324.git.yin31...@gmail.com/ - avoid sending CVQ command in default state suggested by Eugenio v1: https://lists.nongnu.org/archive/html/qemu-devel/2023-06/msg04957.html net/vhost-vdpa.c | 88 1 file changed, 88 insertions(+) diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c index 0994836f8c..9a1905fddd 100644 --- a/net/vhost-vdpa.c +++ b/net/vhost-vdpa.c @@ -867,6 +867,94 @@ static int vhost_vdpa_net_load_rx(VhostVDPAState *s, } } +if (!virtio_vdev_has_feature(&n->parent_obj, VIRTIO_NET_F_CTRL_RX_EXTRA)) { +return 0; +} + +/* + * According to virtio_net_reset(), device turns all-unicast mode + * off by default. + * + * Therefore, QEMU should only send this CVQ command if the driver + * sets all-unicast mode on, different from the device's defaults. + * + * Note that the device's defaults can mismatch the driver's + * configuration only at live migration. + */ +if (n->alluni) { +dev_written = vhost_vdpa_net_load_rx_mode(s, +VIRTIO_NET_CTRL_RX_ALLUNI, 1); +if (dev_written < 0) { +return dev_written; +} +if (*s->status != VIRTIO_NET_OK) { +return -EIO; +} +} + +/* + * According to virtio_net_reset(), device turns non-multicast mode + * off by default. + * + * Therefore, QEMU should only send this CVQ command if the driver + * sets non-multicast mode on, different from the device's defaults. + * + * Note that the device's defaults can mismatch the driver's + * configuration only at live migration. + */ +if (n->nomulti) { +dev_written = vhost_vdpa_net_load_rx_mode(s, +VIRTIO_NET_CTRL_RX_NOMULTI, 1); +if (dev_written < 0) { +return dev_written; +} +if (*s->status != VIRTIO_NET_OK) { +return -EIO; +} +} + +/* + * According to virtio_net_reset(), device turns non-unicast mode + * off by default. + * + * Therefore, QEMU should only send this CVQ command if the driver + * sets non-unicast mode on, different from the device's defaults. + * + * Note that the device's defaults can mismatch the driver's + * configuration only at live migration. + */ +if (n->nouni) { +dev_written = vhost_vdpa_net_load_rx_mode(s, +VIRTIO_NET_CTRL_RX_NOUNI, 1); +if (dev_written < 0) { +return dev_written; +} +if (*s->status != VIRTIO_NET_OK) { +return -EIO; +} +} + +/* + * According to virtio_net_reset(), device turns non-broadcast mode + * off by default. + * + * Therefore, QEMU should only send this CVQ command if the driver + * sets non-broadcast mode on, different from the device's defaults. + * + * Note that the device's defaults can mismatch the driver's + * configuration only at live migration. + */ +if (n->nobcast) { +dev_written = vhost_vdpa_net_load_rx_mode(s, +VIRTIO_NET_CTRL_RX_NOBCAST, 1); +if (dev_written < 0) { +return dev_written; +} +if (*s->status != VIRTIO_NET_OK) { +return -EIO; +} +} + return 0; } -- 2.25.1
[PATCH v3 2/2] vdpa: Allow VIRTIO_NET_F_CTRL_RX_EXTRA in SVQ
Enable SVQ with VIRTIO_NET_F_CTRL_RX_EXTRA feature. Signed-off-by: Hawkins Jiawei Acked-by: Eugenio Pérez --- net/vhost-vdpa.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c index 9a1905fddd..1df82636c9 100644 --- a/net/vhost-vdpa.c +++ b/net/vhost-vdpa.c @@ -105,6 +105,7 @@ static const uint64_t vdpa_svq_device_features = BIT_ULL(VIRTIO_NET_F_STATUS) | BIT_ULL(VIRTIO_NET_F_CTRL_VQ) | BIT_ULL(VIRTIO_NET_F_CTRL_RX) | +BIT_ULL(VIRTIO_NET_F_CTRL_RX_EXTRA) | BIT_ULL(VIRTIO_NET_F_MQ) | BIT_ULL(VIRTIO_F_ANY_LAYOUT) | BIT_ULL(VIRTIO_NET_F_CTRL_MAC_ADDR) | -- 2.25.1
Re: [PATCH] vdpa: Increase out buffer size for CVQ commands
On 2023/7/11 2:52, Michael S. Tsirkin wrote: > On Mon, Jun 26, 2023 at 04:26:04PM +0800, Hawkins Jiawei wrote: >> It appears that my commit message and comments did not take this into >> account. I will refactor them in the v2 patch.. > > does not look like you ever sent v2. > Sorry for not mentioning that I have moved the patch to the patch series titled "Vhost-vdpa Shadow Virtqueue _F_CTRL_RX commands support" at [1]. The reason for this move is that the bug in question should not be triggered until the VIRTIO_NET_CTRL_MAC_TABLE_SET command is exposed by this patch series. I will take care of this in my future patch series. [1].https://lists.gnu.org/archive/html/qemu-devel/2023-07/msg01577.html Thanks!
Re: [PATCH] vdpa: Increase out buffer size for CVQ commands
在 2023/7/12 18:45, Michael Tokarev 写道: > 11.07.2023 04:48, Hawkins Jiawei wrote: > .. >> Sorry for not mentioning that I have moved the patch to the patch series >> titled "Vhost-vdpa Shadow Virtqueue _F_CTRL_RX commands support" at [1]. >> The reason for this move is that the bug in question should not be >> triggered until the VIRTIO_NET_CTRL_MAC_TABLE_SET command is exposed by >> this patch series. > > Does this mean this particular change is not supposed to be applied to > -stable, > as the other change which exposes the bug isn't in any stable series? Yes, you are right. This bug is related to the VIRTIO_NET_CTRL_MAC_TABLE_SET command in SVQ, and this command is not exposed in SVQ in any stable branch, so we do not need to apply the patch to the -stable branch. Thanks! > > Thanks, > > /mjt
[PATCH RFC 2/2] vdpa: Allow VIRTIO_NET_F_CTRL_RX_EXTRA in SVQ
Enable SVQ with VIRTIO_NET_F_CTRL_RX_EXTRA feature. Signed-off-by: Hawkins Jiawei --- net/vhost-vdpa.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c index 9b929762c5..cdfe8e454e 100644 --- a/net/vhost-vdpa.c +++ b/net/vhost-vdpa.c @@ -100,6 +100,7 @@ static const uint64_t vdpa_svq_device_features = BIT_ULL(VIRTIO_NET_F_STATUS) | BIT_ULL(VIRTIO_NET_F_CTRL_VQ) | BIT_ULL(VIRTIO_NET_F_CTRL_RX) | +BIT_ULL(VIRTIO_NET_F_CTRL_RX_EXTRA) | BIT_ULL(VIRTIO_NET_F_MQ) | BIT_ULL(VIRTIO_F_ANY_LAYOUT) | BIT_ULL(VIRTIO_NET_F_CTRL_MAC_ADDR) | -- 2.25.1
[PATCH RFC 1/2] vdpa: Restore packet receive filtering state relative with _F_CTRL_RX_EXTRA feature
This patch refactors vhost_vdpa_net_load_rx() to restore the packet receive filtering state in relation to VIRTIO_NET_F_CTRL_RX_EXTRA feature at device's startup. Signed-off-by: Hawkins Jiawei --- net/vhost-vdpa.c | 30 ++ 1 file changed, 30 insertions(+) diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c index ca800f97e2..9b929762c5 100644 --- a/net/vhost-vdpa.c +++ b/net/vhost-vdpa.c @@ -822,6 +822,36 @@ static int vhost_vdpa_net_load_rx(VhostVDPAState *s, } } +if (virtio_vdev_has_feature(&n->parent_obj, VIRTIO_NET_F_CTRL_RX_EXTRA)) { +/* Load the all-unicast mode */ +on = n->alluni; +r = vhost_vdpa_net_load_rx_mode(s, VIRTIO_NET_CTRL_RX_ALLUNI, on); +if (r < 0) { +return r; +} + +/* Load the non-multicast mode */ +on = n->nomulti; +r = vhost_vdpa_net_load_rx_mode(s, VIRTIO_NET_CTRL_RX_NOMULTI, on); +if (r < 0) { +return r; +} + +/* Load the non-unicast mode */ +on = n->nouni; +r = vhost_vdpa_net_load_rx_mode(s, VIRTIO_NET_CTRL_RX_NOUNI, on); +if (r < 0) { +return r; +} + +/* Load the non-broadcast mode */ +on = n->nobcast; +r = vhost_vdpa_net_load_rx_mode(s, VIRTIO_NET_CTRL_RX_NOBCAST, on); +if (r < 0) { +return r; +} +} + return 0; } -- 2.25.1
[PATCH RFC 0/2] Vhost-vdpa Shadow Virtqueue _F_CTRL_RX_EXTRA commands support
This series enables shadowed CVQ to intercept rx commands related to VIRTIO_NET_F_CTRL_RX_EXTRA feature through shadowed CVQ, update the virtio NIC device model so qemu send it in a migration, and the restore of that rx state in the destination. To test this patch series, I modify the `n->parent_obj.guest_features` value in vhost_vdpa_net_load_rx() using gdb, as the Linux virtio-net driver does not currently support the VIRTIO_NET_F_CTRL_RX_EXTRA feature. Note that this patch should be based on [1], which has not been merged yet. I will submit the v2 patch after they are merged. [1]. https://lore.kernel.org/all/cover.1687402580.git.yin31...@gmail.com/ Hawkins Jiawei (2): vdpa: Restore packet receive filtering state relative with _F_CTRL_RX_EXTRA feature vdpa: Allow VIRTIO_NET_F_CTRL_RX_EXTRA in SVQ net/vhost-vdpa.c | 31 +++ 1 file changed, 31 insertions(+) -- 2.25.1
Re: [PATCH] vdpa: Increase out buffer size for CVQ commands
On 2023/6/25 18:48, Eugenio Perez Martin wrote: > On Thu, Jun 22, 2023 at 3:07 AM Hawkins Jiawei wrote: >> >> According to the VirtIO standard, "Since there are no guarantees, >> it can use a hash filter or silently switch to >> allmulti or promiscuous mode if it is given too many addresses." >> To achive this, QEMU ignores MAC addresses and marks `mac_table.x_overflow` >> in the device internal state in virtio_net_handle_mac() >> if the guest sets more than `MAC_TABLE_ENTRIES` MAC addresses >> for the filter table. >> >> However, the problem is that QEMU never marks the `mac_table.x_overflow` >> for the vdpa device internal state when the guest sets more than >> `MAC_TABLE_ENTRIES` MAC addresses. >> >> To be more specific, currently QEMU offers a buffer size of >> vhost_vdpa_net_cvq_cmd_len() for CVQ commands, which represents the size of >> VIRTIO_NET_CTRL_MAC_TABLE_SET command with a maximum `MAC_TABLE_ENTRIES` >> MAC addresses. >> >> Consequently, if the guest sets more than `MAC_TABLE_ENTRIES` MAC addresses, >> QEMU truncates the CVQ command data and copies this incomplete command >> into the out buffer. In this situation, virtio_net_handle_mac() fails the >> integrity check and returns VIRTIO_NET_ERR instead of marking >> `mac_table.x_overflow` and returning VIRTIO_NET_OK, since the copied >> CVQ command in the buffer is incomplete and flawed. >> >> This patch solves this problem by increasing the buffer size to >> vhost_vdpa_net_cvq_cmd_page_len(), which represents the size of the buffer >> that is allocated and mmaped. Therefore, everything should work correctly >> as long as the guest sets fewer than `(vhost_vdpa_net_cvq_cmd_page_len() - >> sizeof(struct virtio_net_ctrl_hdr) >> - 2 * sizeof(struct virtio_net_ctrl_mac)) / ETH_ALEN` MAC addresses. >> >> Considering the highly unlikely scenario for the guest setting more than >> that number of MAC addresses for the filter table, this patch should >> work fine for the majority of cases. If there is a need for more than thoes >> entries, we can increase the value for vhost_vdpa_net_cvq_cmd_page_len() >> in the future, mapping more than one page for command output. >> >> Fixes: 7a7f87e94c ("vdpa: Move command buffers map to start of net device") >> Signed-off-by: Hawkins Jiawei >> --- >> net/vhost-vdpa.c | 11 ++- >> 1 file changed, 10 insertions(+), 1 deletion(-) >> >> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c >> index 5a72204899..ecfa8852b5 100644 >> --- a/net/vhost-vdpa.c >> +++ b/net/vhost-vdpa.c >> @@ -784,9 +784,18 @@ static int >> vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq, >> }; >> ssize_t dev_written = -EINVAL; >> >> +/* >> + * This code truncates the VIRTIO_NET_CTRL_MAC_TABLE_SET CVQ command >> + * and prevents QEMU from marking `mac_table.x_overflow` in the device >> + * internal state in virtio_net_handle_mac() if the guest sets more than >> + * `(vhost_vdpa_net_cvq_cmd_page_len() - sizeof(struct >> virtio_net_ctrl_hdr) >> + * - 2 * sizeof(struct virtio_net_ctrl_mac)) / ETH_ALEN` MAC addresses >> for >> + * filter table. >> + * However, this situation is considered rare, so it is acceptable. > > I think we can also fix this situation. If it fits in one page, we can > still send the same page to the device and virtio_net_handle_ctrl_iov. > If it does not fit in one page, we can clear all mac filters with > VIRTIO_NET_CTRL_RX_ALLUNI and / or VIRTIO_NET_CTRL_RX_ALLMULTI. Hi Eugenio, Thank you for your review. However, it appears that the approach may not resolve the situation. When the CVQ command exceeds one page, vhost_vdpa_net_handle_ctrl_avail() truncates the command, resulting in a malformed SVQ command being received by the hardware, which in turn triggers an error acknowledgement to QEMU. So if CVQ command exceeds one page, vhost_vdpa_net_handle_ctrl_avail() should not update the device internal state because the SVQ command fails.(Please correct me if I am wrong) It appears that my commit message and comments did not take this into account. I will refactor them in the v2 patch.. Thanks! > > Thanks! > >> + */ >> out.iov_len = iov_to_buf(elem->out_sg, elem->out_num, 0, >>s->cvq_cmd_out_buffer, >> - vhost_vdpa_net_cvq_cmd_len()); >> + vhost_vdpa_net_cvq_cmd_page_len()); >> if (*(uint8_t *)s->cvq_cmd_out_buffer == VIRTIO_NET_CTRL_ANNOUNCE) { >> /* >>* Guest announce capability is emulated by qemu, so don't forward >> to >> -- >> 2.25.1 >> >
Re: [PATCH RFC 1/3] vdpa: Restore MAC address filtering state
On 2023/6/25 18:37, Eugenio Perez Martin wrote: > On Thu, Jun 22, 2023 at 5:02 AM Hawkins Jiawei wrote: >> >> This patch refactors vhost_vdpa_net_load_mac() to >> restore the MAC address filtering state at device's startup. >> >> Signed-off-by: Hawkins Jiawei >> --- >> net/vhost-vdpa.c | 39 ++- >> 1 file changed, 38 insertions(+), 1 deletion(-) >> >> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c >> index ecfa8852b5..10264d3e96 100644 >> --- a/net/vhost-vdpa.c >> +++ b/net/vhost-vdpa.c >> @@ -651,8 +651,45 @@ static int vhost_vdpa_net_load_mac(VhostVDPAState *s, >> const VirtIONet *n) >> if (unlikely(dev_written < 0)) { >> return dev_written; >> } >> +if (*s->status != VIRTIO_NET_OK) { >> +return -EINVAL; >> +} > > I think this part should go in its individual patch, explaining why it > is needed and with corresponding Fixes tag. Yes, you are right. And I have already submitted that patch "Return -EINVAL if device's ack is VIRTIO_NET_ERR" at [1], as mentioned in the cover letter for this series at [2]. [1]. https://lore.kernel.org/all/cover.1686746406.git.yin31...@gmail.com/#t [2]. https://lore.kernel.org/all/cover.1687402580.git.yin31...@gmail.com/ > >> +} >> + >> +if (virtio_vdev_has_feature(&n->parent_obj, VIRTIO_NET_F_CTRL_RX)) { >> +/* Load the MAC address filtering */ >> +uint32_t uni_entries = n->mac_table.first_multi, >> + uni_macs_size = uni_entries * ETH_ALEN, >> + uni_size = sizeof(struct virtio_net_ctrl_mac) + >> uni_macs_size, >> + mul_entries = n->mac_table.in_use - uni_entries, >> + mul_macs_size = mul_entries * ETH_ALEN, >> + mul_size = sizeof(struct virtio_net_ctrl_mac) + >> mul_macs_size, >> + data_size = uni_size + mul_size; >> +void *data = g_malloc(data_size); > > If we keep this part, please use g_autofree here [1]. > > But I think it is not worth copying all the data actually. Maybe it is > worth it to convert vhost_vdpa_net_load_cmd to const iovec? Yes, you are right. I will try to add a helper function in the v2 patch as you have suggested. Thanks! > > Thanks! > > [1] https://www.qemu.org/docs/master/devel/style#automatic-memory-deallocation > >> +struct virtio_net_ctrl_mac *ctrl_mac; >> + >> +/* Pack the non-multicast(unicast) MAC addresses */ >> +ctrl_mac = data; >> +ctrl_mac->entries = cpu_to_le32(uni_entries); >> +memcpy(ctrl_mac->macs, n->mac_table.macs, uni_macs_size); >> + >> +/* Pack the multicast MAC addresses */ >> +ctrl_mac = data + uni_size; >> +ctrl_mac->entries = cpu_to_le32(mul_entries); >> +memcpy(ctrl_mac->macs, &n->mac_table.macs[uni_macs_size], >> + mul_macs_size); >> + >> +ssize_t dev_written = vhost_vdpa_net_load_cmd(s, >> VIRTIO_NET_CTRL_MAC, >> + >> VIRTIO_NET_CTRL_MAC_TABLE_SET, >> + data, data_size); >> +g_free(data); >> >> -return *s->status != VIRTIO_NET_OK; >> +if (unlikely(dev_written < 0)) { >> +return dev_written; >> +} >> +if (*s->status != VIRTIO_NET_OK) { >> +return -EINVAL; >> +} >> } >> >> return 0; >> -- >> 2.25.1 >> >
Re: [PATCH RFC 2/3] vdpa: Restore packet receive filtering state relative with _F_CTRL_RX feature
On 2023/6/25 18:56, Eugenio Perez Martin wrote: > On Thu, Jun 22, 2023 at 5:02 AM Hawkins Jiawei wrote: >> >> This patch introduces vhost_vdpa_net_load_rx_mode() >> and vhost_vdpa_net_load_rx() to restore the packet >> receive filtering state in relation to >> VIRTIO_NET_F_CTRL_RX feature at device's startup. >> >> Signed-off-by: Hawkins Jiawei >> --- >> net/vhost-vdpa.c | 74 >> 1 file changed, 74 insertions(+) >> >> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c >> index 10264d3e96..355a6aef15 100644 >> --- a/net/vhost-vdpa.c >> +++ b/net/vhost-vdpa.c >> @@ -754,6 +754,76 @@ static int vhost_vdpa_net_load_offloads(VhostVDPAState >> *s, >> return *s->status != VIRTIO_NET_OK; >> } >> >> +static int vhost_vdpa_net_load_rx_mode(VhostVDPAState *s, >> + uint8_t cmd, >> + uint8_t on) >> +{ >> +ssize_t dev_written; >> +dev_written = vhost_vdpa_net_load_cmd(s, VIRTIO_NET_CTRL_RX, >> + cmd, &on, sizeof(on)); >> +if (unlikely(dev_written < 0)) { >> +return dev_written; >> +} >> +if (*s->status != VIRTIO_NET_OK) { >> +return -EINVAL; >> +} >> + >> +return 0; >> +} >> + >> +static int vhost_vdpa_net_load_rx(VhostVDPAState *s, >> + const VirtIONet *n) >> +{ >> +uint8_t on; >> +int r; >> + >> +if (virtio_vdev_has_feature(&n->parent_obj, VIRTIO_NET_F_CTRL_RX)) { >> +/* Load the promiscous mode */ >> +if (n->mac_table.uni_overflow) { >> +/* >> + * According to VirtIO standard, "Since there are no guarantees, >> + * it can use a hash filter or silently switch to >> + * allmulti or promiscuous mode if it is given too many >> addresses." >> + * >> + * QEMU ignores non-multicast(unicast) MAC addresses and >> + * marks `uni_overflow` for the device internal state >> + * if guest sets too many non-multicast(unicast) MAC addresses. >> + * Therefore, we should turn promiscous mode on in this case. >> + */ >> +on = 1; >> +} else { >> +on = n->promisc; >> +} >> +r = vhost_vdpa_net_load_rx_mode(s, VIRTIO_NET_CTRL_RX_PROMISC, on); >> +if (r < 0) { >> +return r; >> +} >> + >> +/* Load the all-multicast mode */ >> +if (n->mac_table.multi_overflow) { >> +/* >> + * According to VirtIO standard, "Since there are no guarantees, >> + * it can use a hash filter or silently switch to >> + * allmulti or promiscuous mode if it is given too many >> addresses." >> + * >> + * QEMU ignores multicast MAC addresses and >> + * marks `multi_overflow` for the device internal state >> + * if guest sets too many multicast MAC addresses. >> + * Therefore, we should turn all-multicast mode on in this case. >> + */ >> +on = 1; >> +} else { >> +on = n->allmulti; >> +} >> +r = vhost_vdpa_net_load_rx_mode(s, VIRTIO_NET_CTRL_RX_ALLMULTI, on); >> +if (r < 0) { >> +return r; >> +} > > Can we skip the sending of the CVQ commands if the state matches the > default state? Thanks for your reminder, I forgot this part when coding. I will refactor the patch according to your suggestion and take care of it in the following patches for this part. Thanks! > > Thanks! > >> +} >> + >> +return 0; >> +} >> + >> static int vhost_vdpa_net_load(NetClientState *nc) >> { >> VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc); >> @@ -780,6 +850,10 @@ static int vhost_vdpa_net_load(NetClientState *nc) >> if (unlikely(r)) { >> return r; >> } >> +r = vhost_vdpa_net_load_rx(s, n); >> +if (unlikely(r)) { >> +return r; >> +} >> >> return 0; >> } >> -- >> 2.25.1 >> >
Re: [PATCH RFC 1/2] vdpa: Restore packet receive filtering state relative with _F_CTRL_RX_EXTRA feature
On 2023/6/25 18:54, Eugenio Perez Martin wrote: > On Fri, Jun 23, 2023 at 3:26 PM Hawkins Jiawei wrote: >> >> This patch refactors vhost_vdpa_net_load_rx() to >> restore the packet receive filtering state in relation to >> VIRTIO_NET_F_CTRL_RX_EXTRA feature at device's startup. >> >> Signed-off-by: Hawkins Jiawei >> --- >> net/vhost-vdpa.c | 30 ++ >> 1 file changed, 30 insertions(+) >> >> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c >> index ca800f97e2..9b929762c5 100644 >> --- a/net/vhost-vdpa.c >> +++ b/net/vhost-vdpa.c >> @@ -822,6 +822,36 @@ static int vhost_vdpa_net_load_rx(VhostVDPAState *s, >> } >> } >> >> +if (virtio_vdev_has_feature(&n->parent_obj, >> VIRTIO_NET_F_CTRL_RX_EXTRA)) { >> +/* Load the all-unicast mode */ >> +on = n->alluni; >> +r = vhost_vdpa_net_load_rx_mode(s, VIRTIO_NET_CTRL_RX_ALLUNI, on); >> +if (r < 0) { >> +return r; >> +} >> + >> +/* Load the non-multicast mode */ >> +on = n->nomulti; >> +r = vhost_vdpa_net_load_rx_mode(s, VIRTIO_NET_CTRL_RX_NOMULTI, on); >> +if (r < 0) { >> +return r; >> +} >> + >> +/* Load the non-unicast mode */ >> +on = n->nouni; >> +r = vhost_vdpa_net_load_rx_mode(s, VIRTIO_NET_CTRL_RX_NOUNI, on); >> +if (r < 0) { >> +return r; >> +} >> + >> +/* Load the non-broadcast mode */ >> +on = n->nobcast; >> +r = vhost_vdpa_net_load_rx_mode(s, VIRTIO_NET_CTRL_RX_NOBCAST, on); >> +if (r < 0) { >> +return r; >> +} >> +} >> + > > Can we skip the load if the state matches the virtio defaults? by > virtio_net_reset the defaults are: > n->promisc = 1; > n->allmulti = 0; > n->alluni = 0; > n->nomulti = 0; > n->nouni = 0; > n->nobcast = 0; Yes, you are right. Thanks for your reminder, I forgot this part when coding. I will refactor the patch according to your suggestion and take care of it in the following patches for this part. Thanks! > > Thanks! > >> return 0; >> } >> >> -- >> 2.25.1 >> >