Yes this is expected: 'no-multi-seg' tells DPDK that all packets will consist of 1 and only 1 buffer (no chained buffers). Lots of DPDK PMDs supports vectorization (SSE, NEON...) only for this simpler case. When you set this option, DPDK can select the vectorized PMD instead of the more generic, non-vectorized (and hence slower) version. You can see in the 'show hardware' output that in case of 'no-multi-seg' you get the 'Vector NEON' for both RX and TX, whereas you only get RX and not TX otherwise. So in the 2nd case, the DPDK TX is slower - this is reflected by the 'show run' output, where in the 2nd case the cycles/packet for 'eth0-tx' and 'eth1-tx' grows from 0.8 to 1.8. As TX cost is bigger, VPP is slower, process bigger vectors and less pps overall.
ben > -----Original Message----- > From: vpp-dev@lists.fd.io <vpp-dev@lists.fd.io> On Behalf Of Jieqiang Wang > Sent: vendredi 8 janvier 2021 04:26 > To: vpp-dev <vpp-dev@lists.fd.io> > Cc: Lijian Zhang <lijian.zh...@arm.com>; Tianyu Li <tianyu...@arm.com>; > Govindarajan Mohandoss <govindarajan.mohand...@arm.com>; nd <n...@arm.com> > Subject: [vpp-dev] Questions about no-multi-seg option in startup.conf > > Hi VPP dev, > > > > I was trying to do some benchmarking on VPP and found out no-multi-seg > option in startup.conf will have impact on both the performance and how > the runtime shows. > > The VPP version is v21.01-rc0~547-gf0419a0c8, the DPDK version is DPDK > 20.11.0. > > With no-multi-seg option set in the startup.conf, the runtime shows like > the following: > > Thread 1 vpp_wk_0 (lcore 2) > > Time 1.1, 10 sec internal node vector rate 85.33 loops/sec 97035.37 > > vector rates in 1.2537e7, out 1.2537e7, drop 0.0000e0, punt 0.0000e0 > > Name State Calls Vectors > Suspends Clocks Vectors/Call > > dpdk-input polling 112527 14403456 > 0 9.34e-1 128.00 > > eth0-output active 112527 7201728 > 0 2.04e-1 64.00 > > eth0-tx active 112527 7201728 > 0 7.86e-1 64.00 > > eth1-output active 112527 7201728 > 0 1.91e-1 64.00 > > eth1-tx active 112527 7201728 > 0 7.93e-1 64.00 > > ethernet-input active 225054 14403456 > 0 5.65e-1 64.00 > > ip4-input-no-checksum active 112527 14403456 > 0 3.83e-1 128.00 > > ip4-lookup active 112527 14403456 > 0 5.34e-1 128.00 > > ip4-rewrite active 112527 14403456 > 0 5.73e-1 128.00 > > unix-epoll-input polling 110 0 > 0 2.84e1 0.00 > > > > Output for command 'show hardware-interfaces': > > vpp# sh hardware-interfaces > > Name Idx Link Hardware > > eth0 1 up eth0 > > Link speed: 40 Gbps > > Ethernet address 3c:fd:fe:bb:d4:10 > > Intel X710/XL710 Family > > carrier up full duplex mtu 9206 > > flags: admin-up pmd rx-ip4-cksum > > Devargs: > > rx: queues 1 (max 320), desc 1024 (min 64 max 4096 align 32) > > tx: queues 2 (max 320), desc 1024 (min 64 max 4096 align 32) > > pci: device 8086:1583 subsystem 8086:0001 address 0001:01:00.00 numa 0 > > max rx packet len: 9728 > > promiscuous: unicast off all-multicast on > > vlan offload: strip off filter off qinq off > > rx offload avail: vlan-strip ipv4-cksum udp-cksum tcp-cksum qinq- > strip > > outer-ipv4-cksum vlan-filter vlan-extend jumbo- > frame > > scatter keep-crc rss-hash > > rx offload active: ipv4-cksum > > tx offload avail: vlan-insert ipv4-cksum udp-cksum tcp-cksum sctp- > cksum > > tcp-tso outer-ipv4-cksum qinq-insert vxlan-tnl-tso > > gre-tnl-tso ipip-tnl-tso geneve-tnl-tso multi-segs > > mbuf-fast-free > > tx offload active: none > > rss avail: ipv4-frag ipv4-tcp ipv4-udp ipv4-sctp ipv4-other > ipv6-frag > > ipv6-tcp ipv6-udp ipv6-sctp ipv6-other l2-payload > > rss active: none > > tx burst mode: Vector Neon > > rx burst mode: Vector Neon > > > > Without no-mutli-seg option in startup.conf, the runtime shows as below: > > Thread 1 vpp_wk_0 (lcore 2) > > Time 1.7, 10 sec internal node vector rate 256.00 loops/sec 19628.70 > > vector rates in 1.0186e7, out 1.0186e7, drop 0.0000e0, punt 0.0000e0 > > Name State Calls Vectors > Suspends Clocks Vectors/Call > > dpdk-input polling 34157 17488384 > 0 9.51e-1 512.00 > > eth0-output active 34157 8744192 > 0 1.66e-1 256.00 > > eth0-tx active 34157 8744192 > 0 1.84e0 256.00 > > eth1-output active 34157 8744192 > 0 1.71e-1 256.00 > > eth1-tx active 34157 8744192 > 0 1.88e0 256.00 > > ethernet-input active 68314 17488384 > 0 4.60e-1 256.00 > > ip4-input-no-checksum active 68314 17488384 > 0 3.58e-1 256.00 > > ip4-lookup active 68314 17488384 > 0 5.29e-1 256.00 > > ip4-rewrite active 68314 17488384 > 0 5.78e-1 256.00 > > unix-epoll-input polling 33 0 > 0 3.39e1 0.00 > > > > Output for command 'show hardware-interfaces': > > vpp# sh hardware-interfaces > > Name Idx Link Hardware > > eth0 1 up eth0 > > Link speed: 40 Gbps > > Ethernet address 3c:fd:fe:bb:d4:10 > > Intel X710/XL710 Family > > carrier up full duplex mtu 9206 > > flags: admin-up pmd maybe-multiseg rx-ip4-cksum > > Devargs: > > rx: queues 1 (max 320), desc 1024 (min 64 max 4096 align 32) > > tx: queues 2 (max 320), desc 1024 (min 64 max 4096 align 32) > > pci: device 8086:1583 subsystem 8086:0001 address 0001:01:00.00 numa 0 > > max rx packet len: 9728 > > promiscuous: unicast off all-multicast on > > vlan offload: strip off filter off qinq off > > rx offload avail: vlan-strip ipv4-cksum udp-cksum tcp-cksum qinq- > strip > > outer-ipv4-cksum vlan-filter vlan-extend jumbo- > frame > > scatter keep-crc rss-hash > > rx offload active: ipv4-cksum jumbo-frame scatter > > tx offload avail: vlan-insert ipv4-cksum udp-cksum tcp-cksum sctp- > cksum > > tcp-tso outer-ipv4-cksum qinq-insert vxlan-tnl-tso > > gre-tnl-tso ipip-tnl-tso geneve-tnl-tso multi-segs > > mbuf-fast-free > > tx offload active: multi-segs > > rss avail: ipv4-frag ipv4-tcp ipv4-udp ipv4-sctp ipv4-other > ipv6-frag > > ipv6-tcp ipv6-udp ipv6-sctp ipv6-other l2-payload > > rss active: none > > tx burst mode: Scalar > > rx burst mode: Vector Neon Scattered > > > > So I am wondering why no-multi-seg option will change the vector rates > highlight in red above? Is this phenomenon expected? > > I also saw performance drop when no-multi-seg option was not set in > startup.conf when I sent small packets(like 64 bytes) as traffic > input(simple IPv4 routing test case with 1 flow). How will VPP behave > differently if no-multi-seg option is set? > > > > Look forward to getting your feedback. > > > > Thanks, > > Jieqiang Wang > > > > IMPORTANT NOTICE: The contents of this email and any attachments are > confidential and may also be privileged. If you are not the intended > recipient, please notify the sender immediately and do not disclose the > contents to any other person, use it for any purpose, or store or copy the > information in any medium. Thank you.
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#18489): https://lists.fd.io/g/vpp-dev/message/18489 Mute This Topic: https://lists.fd.io/mt/79516636/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-