Hi VPP maintainers,


Recently VPP has upgraded the DPDK version to DPDK-21.08, which includes two 
optimization patches[1][2] from Arm DPDK team. With the mbuf-fast-free flag, 
the two patches add code segment to accelerate mbuf free in PMD TX path for 
i40e driver, which shows quite obvious performance improvement from DPDK L3FWD 
benchmarking results.



I tried to verify the benefits that those optimization patches can bring up to 
VPP, but find out that mbuf-fast-free flag is not enabled in VPP+DPDK by 
default.

Applying DPDK mbuf-fast-free has some constraints, e.g,

  *   mbufs to be freed should be coming from the same mempool
  *   ref_cnt == 1 always in mbuf meta-data when user apps call DPDK 
rte_eth_tx_burst ()
  *   No TX checksum offload
  *   No jumble frame

But VPP vector mode(set by adding ‘no-tx-checksum-offload’ and ‘no-multi-seg’ 
parameters in dpdk section of the startup.conf) seems to satisfy all the 
requirements. So I made a few code changes shown as below to set mbuf-fast-free 
flag by default in VPP vector mode and did some benchmarking for IPv4 routing 
test cases with 1 flow/10k flows. The benchmarking results show both throughput 
improvement and CPU cycles saved regarding DPDK transmit function.



So any thought on enabling mbuf-fast-free tx offload flag in VPP vector mode?  
Any feedback is welcome :)



Code Changes:



diff --git a/src/plugins/dpdk/device/init.c b/src/plugins/dpdk/device/init.c

index f7c1cc106..0fbdd2317 100644

--- a/src/plugins/dpdk/device/init.c

+++ b/src/plugins/dpdk/device/init.c

@@ -398,6 +398,8 @@ dpdk_lib_init (dpdk_main_t * dm)

          xd->port_conf.rxmode.offloads |= DEV_RX_OFFLOAD_SCATTER;

          xd->flags |= DPDK_DEVICE_FLAG_MAYBE_MULTISEG;

        }

+      if (dm->conf->no_multi_seg && dm->conf->no_tx_checksum_offload)

+       xd->port_conf.txmode.offloads |= DEV_TX_OFFLOAD_MBUF_FAST_FREE;



       xd->tx_q_used = clib_min (dev_info.max_tx_queues, tm->n_vlib_mains);



Benchmark Results:



1 flow, bidirectional

Throughput(Mpps):



Original

Patched

Ratio

N1SDP

11.62

12.44

+7.06%

ThunderX2

9.52

10.16

+6.30%

Dell 8268

17.82

18.20

+2.13%



CPU cycles overhead for DPDK transmit function(recorded by Perf tools):



Original

Patched

Ratio

N1SDP

13.08%

5.53%

-7.55%

ThunderX2

11.01%

6.68%

-4.33%

Dell 8268

10.78%

7.35%

-3.43%



10k flows, bidirectional

Throughput(Mpps):



Original

Patched

Ratio

N1SDP

8.48

9.0

+6.13%

ThunderX2

8.84

9.26

+4.75%

Dell 8268

15.04

15.40

+2.39%



CPU cycles overhead for DPDK transmit function(recorded by Perf tools):



Original

Patched

Ratio

N1SDP

10.58%

4.54%

-6.04%

ThunderX2

12.92%

6.63%

-6.29%

Dell 8268

10.36%

7.97%

-2.39%



[1] 
http://git.dpdk.org/dpdk/commit/?h=v21.08-rc1&id=be8ff6210851fdacbe00033259b7dc5426e95589

[2] 
http://git.dpdk.org/dpdk/commit/?h=v21.08-rc1&id=95e7bb6a5fc9e371e763b11ec15786e4d574ef8e



Best Regards,

Jieqiang Wang

IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#20152): https://lists.fd.io/g/vpp-dev/message/20152
Mute This Topic: https://lists.fd.io/mt/85669132/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to