Tested-by: Qian Xu <qian.q.xu at intel.com> - Apply patch to dpdk-next-virtio: Pass - Compile: Pass - OS: Ubuntu16.04 4.4.0-34-generic - GCC: 5.4.0
Test Case - Pass, over 20% performance gain for big packet(1024B), and it's designed to improve big packet performance. - Test case: Without NIC, Vhost dequeuer, virtio TXonly, mergeable=on, then see ~28% performance gains for packet size 1518B; for small packet 64B, similar performance as zero-copy=0. - Test case: With Intel FVL 40G NIC, run PVP case, txd=128, mergeable=on, for packet size over 1K(1024B), we can see the performance benefits, for example, 1024 will get 18% performance gains; 1518B will get 26% performance gain compared with zero-copy=0, for small packet such as 64B, we will get 15% performance drop which is reasonable, and vhost zero-copy is not applicable for the small packet performance. -----Original Message----- From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Yuanhan Liu Sent: Sunday, October 9, 2016 8:28 AM To: dev at dpdk.org Cc: Maxime Coquelin <maxime.coquelin at redhat.com>; Yuanhan Liu <yuanhan.liu at linux.intel.com> Subject: [dpdk-dev] [PATCH v3 0/7] vhost: add dequeue zero copy support This patch set enables vhost dequeue zero copy. The majority work goes to patch 4: "vhost: add dequeue zero copy". The basic idea of dequeue zero copy is, instead of copying data from the desc buf, here we let the mbuf reference the desc buf addr directly. The major issue behind that is how and when to update the used ring. You could check the commit log of patch 4 for more details. Patch 5 introduces a new flag, RTE_VHOST_USER_DEQUEUE_ZERO_COPY, to enable dequeue zero copy, which is disabled by default. The performance gain is quite impressive. For a simple dequeue workload (running rxonly in vhost-pmd and runnin txonly in guest testpmd), it yields 50+% performance boost for packet size 1500B. For VM2VM iperf test case, it's even better: about 70% boost. For small packets, the performance is worse (it's expected, as the extra overhead introduced by zero copy outweighs the benefits from saving few bytes copy). v3: - rebase: mainly for removing conflicts with the Tx indirect patch - don't update last_used_idx twice for zero-copy mode - handle two mssiing "Tx -> dequeue" renames in log and usage v2: - renamed "tx zero copy" to "dequeue zero copy", to reduce confusions. - hnadle the case that a desc buf might across 2 host phys pages - use MAP_POPULATE to let kernel populate the page tables - updated release note - doc-ed the limitations for the vm2nic case - merge 2 continuous guest phys memory region - and few more trivial changes, please see them in the corresponding patches --- Yuanhan Liu (7): vhost: simplify memory regions handling vhost: get guest/host physical address mappings vhost: introduce last avail idx for dequeue vhost: add dequeue zero copy vhost: add a flag to enable dequeue zero copy examples/vhost: add an option to enable dequeue zero copy net/vhost: add an option to enable dequeue zero copy doc/guides/prog_guide/vhost_lib.rst | 35 +++- doc/guides/rel_notes/release_16_11.rst | 13 ++ drivers/net/vhost/rte_eth_vhost.c | 13 ++ examples/vhost/main.c | 19 +- lib/librte_vhost/rte_virtio_net.h | 1 + lib/librte_vhost/socket.c | 5 + lib/librte_vhost/vhost.c | 12 ++ lib/librte_vhost/vhost.h | 102 ++++++++--- lib/librte_vhost/vhost_user.c | 315 ++++++++++++++++++++++----------- lib/librte_vhost/virtio_net.c | 196 +++++++++++++++++--- 10 files changed, 549 insertions(+), 162 deletions(-) -- 1.9.0