Hi VPP maintainers,
Recently VPP has upgraded the DPDK version to DPDK-21.08, which includes two optimization patches[1][2] from Arm DPDK team. With the mbuf-fast-free flag, the two patches add code segment to accelerate mbuf free in PMD TX path for i40e driver, which shows quite obvious performance improvement from DPDK L3FWD benchmarking results. I tried to verify the benefits that those optimization patches can bring up to VPP, but find out that mbuf-fast-free flag is not enabled in VPP+DPDK by default. Applying DPDK mbuf-fast-free has some constraints, e.g, * mbufs to be freed should be coming from the same mempool * ref_cnt == 1 always in mbuf meta-data when user apps call DPDK rte_eth_tx_burst () * No TX checksum offload * No jumble frame But VPP vector mode(set by adding ‘no-tx-checksum-offload’ and ‘no-multi-seg’ parameters in dpdk section of the startup.conf) seems to satisfy all the requirements. So I made a few code changes shown as below to set mbuf-fast-free flag by default in VPP vector mode and did some benchmarking for IPv4 routing test cases with 1 flow/10k flows. The benchmarking results show both throughput improvement and CPU cycles saved regarding DPDK transmit function. So any thought on enabling mbuf-fast-free tx offload flag in VPP vector mode? Any feedback is welcome :) Code Changes: diff --git a/src/plugins/dpdk/device/init.c b/src/plugins/dpdk/device/init.c index f7c1cc106..0fbdd2317 100644 --- a/src/plugins/dpdk/device/init.c +++ b/src/plugins/dpdk/device/init.c @@ -398,6 +398,8 @@ dpdk_lib_init (dpdk_main_t * dm) xd->port_conf.rxmode.offloads |= DEV_RX_OFFLOAD_SCATTER; xd->flags |= DPDK_DEVICE_FLAG_MAYBE_MULTISEG; } + if (dm->conf->no_multi_seg && dm->conf->no_tx_checksum_offload) + xd->port_conf.txmode.offloads |= DEV_TX_OFFLOAD_MBUF_FAST_FREE; xd->tx_q_used = clib_min (dev_info.max_tx_queues, tm->n_vlib_mains); Benchmark Results: 1 flow, bidirectional Throughput(Mpps): Original Patched Ratio N1SDP 11.62 12.44 +7.06% ThunderX2 9.52 10.16 +6.30% Dell 8268 17.82 18.20 +2.13% CPU cycles overhead for DPDK transmit function(recorded by Perf tools): Original Patched Ratio N1SDP 13.08% 5.53% -7.55% ThunderX2 11.01% 6.68% -4.33% Dell 8268 10.78% 7.35% -3.43% 10k flows, bidirectional Throughput(Mpps): Original Patched Ratio N1SDP 8.48 9.0 +6.13% ThunderX2 8.84 9.26 +4.75% Dell 8268 15.04 15.40 +2.39% CPU cycles overhead for DPDK transmit function(recorded by Perf tools): Original Patched Ratio N1SDP 10.58% 4.54% -6.04% ThunderX2 12.92% 6.63% -6.29% Dell 8268 10.36% 7.97% -2.39% [1] http://git.dpdk.org/dpdk/commit/?h=v21.08-rc1&id=be8ff6210851fdacbe00033259b7dc5426e95589 [2] http://git.dpdk.org/dpdk/commit/?h=v21.08-rc1&id=95e7bb6a5fc9e371e763b11ec15786e4d574ef8e Best Regards, Jieqiang Wang IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#20152): https://lists.fd.io/g/vpp-dev/message/20152 Mute This Topic: https://lists.fd.io/mt/85669132/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-