Hi, All Recently I found a big performance difference (~5Gbps vs. ~300Mbps by iperf3) between two settings, which AF_PACKET related.
The testing topology: Two servers are directly connected through 10G link; One is running VPP on the 10G NIC and AF_PACKET interface. And the iperf client (TCP) is sending packets to the AF_PACKET interface. The other is running iperf server. The startup.conf settings: ... cpu { main-core 18 corelist-workers 19,20,21,22 } dev 0000:af:00.1 { workers 0 # workers 1 } .... The only difference of two testing is the "workers" setting, one (test1) is setting workers to 0, the other (test2) is setting workers to 1. And I found the iperf3 perf is ~5Gbps for test1, while only ~300Mbps for test2. So I run "show run" to see what's happening and what's the difference. Below is the output of "show run" that I only paste the working core. Test1 (workers to 0) Thread 1 vpp_wk_0 (lcore 19) Time 1.2, average vectors/node 32.01, last 128 main loops .05 per node 1.00 vector rates in 4.7033e5, out 4.7033e5, drop 0.0000e0, punt 0.0000e0 Name State Calls Vectors Suspends Clocks Vectors/Call TenGigabitEthernetaf/0/1-outpu active 8134 528460 0 1.32e1 64.97 TenGigabitEthernetaf/0/1-tx active 8134 528460 0 1.32e2 64.97 af-packet-input interrupt wa 184542 528460 0 8.31e2 2.86 dpdk-input polling 038663 40390 0 5.87e3 .02 ethernet-input active 8134 528460 0 2.91e1 64.97 host-vpp0-output active 11729 40390 0 8.22e1 3.44 host-vpp0-tx active 11729 40390 0 4.37e4 3.44 ip4-input active 8134 528460 0 3.89e1 64.97 ip4-input-no-checksum active 11729 40390 0 2.03e2 3.44 ip4-lookup active 19863 568850 0 3.68e1 28.64 ip4-rewrite active 19863 568850 0 3.62e1 28.64 Test2 (workers to 1) Thread 1 vpp_wk_0 (lcore 19) Time 1.3, average vectors/node 256.00, last 128 main loops 0.00 per node 0.00 vector rates in 2.4971e4, out 2.4971e4, drop 0.0000e0, punt 0.0000e0 Name State Calls Vectors Suspends Clocks Vectors/Call TenGigabitEthernetaf/0/1-outpu active 130 33280 0 9.40e0 256.00 TenGigabitEthernetaf/0/1-tx active 130 33280 0 1.17e2 256.00 af-packet-input interrupt wa 8146 33280 0 6.34e2 4.09 ethernet-input active 130 33280 0 2.84e1 256.00 ip4-input active 130 33280 0 3.56e1 256.00 ip4-lookup active 130 33280 0 3.13e1 256.00 ip4-rewrite active 130 33280 0 2.97e1 256.00 unix-epoll-input polling 9317 0 0 4.62e5 0.00 --------------- Thread 2 vpp_wk_1 (lcore 20) Time 1.3, average vectors/node 1.44, last 128 main loops 0.00 per node 0.00 vector rates in 5.3198e2, out 5.3198e2, drop 0.0000e0, punt 0.0000e0 Name State Calls Vectors Suspends Clocks Vectors/Call dpdk-input polling 18232975 709 0 2.29e6 0.00 host-vpp0-output active 492 709 0 1.91e2 1.44 host-vpp0-tx active 492 709 0 1.51e5 1.44 ip4-input-no-checksum active 492 709 0 3.20e2 1.44 ip4-lookup active 492 709 0 2.16e2 1.44 ip4-rewrite active 492 709 0 1.88e2 1.44 My observation is : For test1, the dpdk-input and af-packet-input are running in the same core (core19), while for test2, dpdk-input (core 20) and af-packet-input (core19) are running in the different core. My question: Is it the reason for such huge performance difference? If yes, seems like af-packet-input (by default) is handling by the first worker core. Any optimization method to bind the af-packet-input and dpdk-input in the same core? My VPP config is simple: set interface state TenGigabitEthernetaf/0/1 up set interface ip address TenGigabitEthernetaf/0/1 10.1.1.1/24 set interface promiscuous on TenGigabitEthernetaf/0/1 create host-interface name vpp0 set interface state host-vpp0 up set interface ip address host-vpp0 10.1.2.1/24 set interface promiscuous on host-vpp0 Many thanks. Thx, Xuekun
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#11030): https://lists.fd.io/g/vpp-dev/message/11030 Mute This Topic: https://lists.fd.io/mt/27791437/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-