Hi VPP dev, I was trying to do some benchmarking on VPP and found out no-multi-seg option in startup.conf will have impact on both the performance and how the runtime shows. The VPP version is v21.01-rc0~547-gf0419a0c8, the DPDK version is DPDK 20.11.0. With no-multi-seg option set in the startup.conf, the runtime shows like the following: Thread 1 vpp_wk_0 (lcore 2) Time 1.1, 10 sec internal node vector rate 85.33 loops/sec 97035.37 vector rates in 1.2537e7, out 1.2537e7, drop 0.0000e0, punt 0.0000e0 Name State Calls Vectors Suspends Clocks Vectors/Call dpdk-input polling 112527 14403456 0 9.34e-1 128.00 eth0-output active 112527 7201728 0 2.04e-1 64.00 eth0-tx active 112527 7201728 0 7.86e-1 64.00 eth1-output active 112527 7201728 0 1.91e-1 64.00 eth1-tx active 112527 7201728 0 7.93e-1 64.00 ethernet-input active 225054 14403456 0 5.65e-1 64.00 ip4-input-no-checksum active 112527 14403456 0 3.83e-1 128.00 ip4-lookup active 112527 14403456 0 5.34e-1 128.00 ip4-rewrite active 112527 14403456 0 5.73e-1 128.00 unix-epoll-input polling 110 0 0 2.84e1 0.00
Output for command 'show hardware-interfaces': vpp# sh hardware-interfaces Name Idx Link Hardware eth0 1 up eth0 Link speed: 40 Gbps Ethernet address 3c:fd:fe:bb:d4:10 Intel X710/XL710 Family carrier up full duplex mtu 9206 flags: admin-up pmd rx-ip4-cksum Devargs: rx: queues 1 (max 320), desc 1024 (min 64 max 4096 align 32) tx: queues 2 (max 320), desc 1024 (min 64 max 4096 align 32) pci: device 8086:1583 subsystem 8086:0001 address 0001:01:00.00 numa 0 max rx packet len: 9728 promiscuous: unicast off all-multicast on vlan offload: strip off filter off qinq off rx offload avail: vlan-strip ipv4-cksum udp-cksum tcp-cksum qinq-strip outer-ipv4-cksum vlan-filter vlan-extend jumbo-frame scatter keep-crc rss-hash rx offload active: ipv4-cksum tx offload avail: vlan-insert ipv4-cksum udp-cksum tcp-cksum sctp-cksum tcp-tso outer-ipv4-cksum qinq-insert vxlan-tnl-tso gre-tnl-tso ipip-tnl-tso geneve-tnl-tso multi-segs mbuf-fast-free tx offload active: none rss avail: ipv4-frag ipv4-tcp ipv4-udp ipv4-sctp ipv4-other ipv6-frag ipv6-tcp ipv6-udp ipv6-sctp ipv6-other l2-payload rss active: none tx burst mode: Vector Neon rx burst mode: Vector Neon Without no-mutli-seg option in startup.conf, the runtime shows as below: Thread 1 vpp_wk_0 (lcore 2) Time 1.7, 10 sec internal node vector rate 256.00 loops/sec 19628.70 vector rates in 1.0186e7, out 1.0186e7, drop 0.0000e0, punt 0.0000e0 Name State Calls Vectors Suspends Clocks Vectors/Call dpdk-input polling 34157 17488384 0 9.51e-1 512.00 eth0-output active 34157 8744192 0 1.66e-1 256.00 eth0-tx active 34157 8744192 0 1.84e0 256.00 eth1-output active 34157 8744192 0 1.71e-1 256.00 eth1-tx active 34157 8744192 0 1.88e0 256.00 ethernet-input active 68314 17488384 0 4.60e-1 256.00 ip4-input-no-checksum active 68314 17488384 0 3.58e-1 256.00 ip4-lookup active 68314 17488384 0 5.29e-1 256.00 ip4-rewrite active 68314 17488384 0 5.78e-1 256.00 unix-epoll-input polling 33 0 0 3.39e1 0.00 Output for command 'show hardware-interfaces': vpp# sh hardware-interfaces Name Idx Link Hardware eth0 1 up eth0 Link speed: 40 Gbps Ethernet address 3c:fd:fe:bb:d4:10 Intel X710/XL710 Family carrier up full duplex mtu 9206 flags: admin-up pmd maybe-multiseg rx-ip4-cksum Devargs: rx: queues 1 (max 320), desc 1024 (min 64 max 4096 align 32) tx: queues 2 (max 320), desc 1024 (min 64 max 4096 align 32) pci: device 8086:1583 subsystem 8086:0001 address 0001:01:00.00 numa 0 max rx packet len: 9728 promiscuous: unicast off all-multicast on vlan offload: strip off filter off qinq off rx offload avail: vlan-strip ipv4-cksum udp-cksum tcp-cksum qinq-strip outer-ipv4-cksum vlan-filter vlan-extend jumbo-frame scatter keep-crc rss-hash rx offload active: ipv4-cksum jumbo-frame scatter tx offload avail: vlan-insert ipv4-cksum udp-cksum tcp-cksum sctp-cksum tcp-tso outer-ipv4-cksum qinq-insert vxlan-tnl-tso gre-tnl-tso ipip-tnl-tso geneve-tnl-tso multi-segs mbuf-fast-free tx offload active: multi-segs rss avail: ipv4-frag ipv4-tcp ipv4-udp ipv4-sctp ipv4-other ipv6-frag ipv6-tcp ipv6-udp ipv6-sctp ipv6-other l2-payload rss active: none tx burst mode: Scalar rx burst mode: Vector Neon Scattered So I am wondering why no-multi-seg option will change the vector rates highlight in red above? Is this phenomenon expected? I also saw performance drop when no-multi-seg option was not set in startup.conf when I sent small packets(like 64 bytes) as traffic input(simple IPv4 routing test case with 1 flow). How will VPP behave differently if no-multi-seg option is set? Look forward to getting your feedback. Thanks, Jieqiang Wang IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#18488): https://lists.fd.io/g/vpp-dev/message/18488 Mute This Topic: https://lists.fd.io/mt/79516636/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-