Yes this is expected: 'no-multi-seg' tells DPDK that all packets will consist 
of 1 and only 1 buffer (no chained buffers). Lots of DPDK PMDs supports 
vectorization (SSE, NEON...) only for this simpler case. When you set this 
option, DPDK can select the vectorized PMD instead of the more generic, 
non-vectorized (and hence slower) version.
You can see in the 'show hardware' output that in case of 'no-multi-seg' you 
get the 'Vector NEON' for both RX and TX, whereas you only get RX and not TX 
otherwise. So in the 2nd case, the DPDK TX is slower - this is reflected by the 
'show run' output, where in the 2nd case the cycles/packet for 'eth0-tx' and 
'eth1-tx' grows from 0.8 to 1.8. As TX cost is bigger, VPP is slower, process 
bigger vectors and less pps overall.

ben

> -----Original Message-----
> From: vpp-dev@lists.fd.io <vpp-dev@lists.fd.io> On Behalf Of Jieqiang Wang
> Sent: vendredi 8 janvier 2021 04:26
> To: vpp-dev <vpp-dev@lists.fd.io>
> Cc: Lijian Zhang <lijian.zh...@arm.com>; Tianyu Li <tianyu...@arm.com>;
> Govindarajan Mohandoss <govindarajan.mohand...@arm.com>; nd <n...@arm.com>
> Subject: [vpp-dev] Questions about no-multi-seg option in startup.conf
> 
> Hi VPP dev,
> 
> 
> 
> I was trying to do some benchmarking on VPP and found out no-multi-seg
> option in startup.conf will have impact on both the performance and how
> the runtime shows.
> 
> The VPP version is v21.01-rc0~547-gf0419a0c8, the DPDK version is DPDK
> 20.11.0.
> 
> With no-multi-seg option set in the startup.conf, the runtime shows like
> the following:
> 
> Thread 1 vpp_wk_0 (lcore 2)
> 
> Time 1.1, 10 sec internal node vector rate 85.33 loops/sec 97035.37
> 
>   vector rates in 1.2537e7, out 1.2537e7, drop 0.0000e0, punt 0.0000e0
> 
>              Name                 State         Calls          Vectors
> Suspends         Clocks       Vectors/Call
> 
> dpdk-input                       polling            112527        14403456
> 0         9.34e-1          128.00
> 
> eth0-output                      active             112527         7201728
> 0         2.04e-1           64.00
> 
> eth0-tx                          active             112527         7201728
> 0         7.86e-1           64.00
> 
> eth1-output                      active             112527         7201728
> 0         1.91e-1           64.00
> 
> eth1-tx                          active             112527         7201728
> 0         7.93e-1           64.00
> 
> ethernet-input                   active             225054        14403456
> 0         5.65e-1           64.00
> 
> ip4-input-no-checksum            active             112527        14403456
> 0         3.83e-1          128.00
> 
> ip4-lookup                       active             112527        14403456
> 0         5.34e-1          128.00
> 
> ip4-rewrite                      active             112527        14403456
> 0         5.73e-1          128.00
> 
> unix-epoll-input                 polling               110               0
> 0          2.84e1            0.00
> 
> 
> 
> Output for command 'show hardware-interfaces':
> 
> vpp# sh hardware-interfaces
> 
>               Name                Idx   Link  Hardware
> 
> eth0                               1     up   eth0
> 
>   Link speed: 40 Gbps
> 
>   Ethernet address 3c:fd:fe:bb:d4:10
> 
>   Intel X710/XL710 Family
> 
>     carrier up full duplex mtu 9206
> 
>     flags: admin-up pmd rx-ip4-cksum
> 
>     Devargs:
> 
>     rx: queues 1 (max 320), desc 1024 (min 64 max 4096 align 32)
> 
>     tx: queues 2 (max 320), desc 1024 (min 64 max 4096 align 32)
> 
>     pci: device 8086:1583 subsystem 8086:0001 address 0001:01:00.00 numa 0
> 
>     max rx packet len: 9728
> 
>     promiscuous: unicast off all-multicast on
> 
>     vlan offload: strip off filter off qinq off
> 
>     rx offload avail:  vlan-strip ipv4-cksum udp-cksum tcp-cksum qinq-
> strip
> 
>                        outer-ipv4-cksum vlan-filter vlan-extend jumbo-
> frame
> 
>                        scatter keep-crc rss-hash
> 
>     rx offload active: ipv4-cksum
> 
>     tx offload avail:  vlan-insert ipv4-cksum udp-cksum tcp-cksum sctp-
> cksum
> 
>                        tcp-tso outer-ipv4-cksum qinq-insert vxlan-tnl-tso
> 
>                        gre-tnl-tso ipip-tnl-tso geneve-tnl-tso multi-segs
> 
>                        mbuf-fast-free
> 
>     tx offload active: none
> 
>     rss avail:         ipv4-frag ipv4-tcp ipv4-udp ipv4-sctp ipv4-other
> ipv6-frag
> 
>                        ipv6-tcp ipv6-udp ipv6-sctp ipv6-other l2-payload
> 
>     rss active:        none
> 
>     tx burst mode: Vector Neon
> 
>     rx burst mode: Vector Neon
> 
> 
> 
> Without no-mutli-seg option in startup.conf, the runtime shows as below:
> 
> Thread 1 vpp_wk_0 (lcore 2)
> 
> Time 1.7, 10 sec internal node vector rate 256.00 loops/sec 19628.70
> 
>   vector rates in 1.0186e7, out 1.0186e7, drop 0.0000e0, punt 0.0000e0
> 
>              Name                 State         Calls          Vectors
> Suspends         Clocks       Vectors/Call
> 
> dpdk-input                       polling             34157        17488384
> 0         9.51e-1          512.00
> 
> eth0-output                      active              34157         8744192
> 0         1.66e-1          256.00
> 
> eth0-tx                          active              34157         8744192
> 0          1.84e0          256.00
> 
> eth1-output                      active              34157         8744192
> 0         1.71e-1          256.00
> 
> eth1-tx                          active              34157         8744192
> 0          1.88e0          256.00
> 
> ethernet-input                   active              68314        17488384
> 0         4.60e-1          256.00
> 
> ip4-input-no-checksum            active              68314        17488384
> 0         3.58e-1          256.00
> 
> ip4-lookup                       active              68314        17488384
> 0         5.29e-1          256.00
> 
> ip4-rewrite                      active              68314        17488384
> 0         5.78e-1          256.00
> 
> unix-epoll-input                 polling                33               0
> 0          3.39e1            0.00
> 
> 
> 
> Output for command 'show hardware-interfaces':
> 
> vpp# sh hardware-interfaces
> 
>               Name                Idx   Link  Hardware
> 
> eth0                               1     up   eth0
> 
>   Link speed: 40 Gbps
> 
>   Ethernet address 3c:fd:fe:bb:d4:10
> 
>   Intel X710/XL710 Family
> 
>     carrier up full duplex mtu 9206
> 
>     flags: admin-up pmd maybe-multiseg rx-ip4-cksum
> 
>     Devargs:
> 
>     rx: queues 1 (max 320), desc 1024 (min 64 max 4096 align 32)
> 
>     tx: queues 2 (max 320), desc 1024 (min 64 max 4096 align 32)
> 
>     pci: device 8086:1583 subsystem 8086:0001 address 0001:01:00.00 numa 0
> 
>     max rx packet len: 9728
> 
>     promiscuous: unicast off all-multicast on
> 
>     vlan offload: strip off filter off qinq off
> 
>     rx offload avail:  vlan-strip ipv4-cksum udp-cksum tcp-cksum qinq-
> strip
> 
>                        outer-ipv4-cksum vlan-filter vlan-extend jumbo-
> frame
> 
>                        scatter keep-crc rss-hash
> 
>     rx offload active: ipv4-cksum jumbo-frame scatter
> 
>     tx offload avail:  vlan-insert ipv4-cksum udp-cksum tcp-cksum sctp-
> cksum
> 
>                        tcp-tso outer-ipv4-cksum qinq-insert vxlan-tnl-tso
> 
>                        gre-tnl-tso ipip-tnl-tso geneve-tnl-tso multi-segs
> 
>                        mbuf-fast-free
> 
>     tx offload active: multi-segs
> 
>     rss avail:         ipv4-frag ipv4-tcp ipv4-udp ipv4-sctp ipv4-other
> ipv6-frag
> 
>                        ipv6-tcp ipv6-udp ipv6-sctp ipv6-other l2-payload
> 
>     rss active:        none
> 
>     tx burst mode: Scalar
> 
>     rx burst mode: Vector Neon Scattered
> 
> 
> 
> So I am wondering why no-multi-seg option will change the vector rates
> highlight in red above? Is this phenomenon expected?
> 
> I also saw performance drop when no-multi-seg option was not set in
> startup.conf when I sent small packets(like 64 bytes) as traffic
> input(simple IPv4 routing test case with 1 flow). How will VPP behave
> differently if no-multi-seg option is set?
> 
> 
> 
> Look forward to getting your feedback.
> 
> 
> 
> Thanks,
> 
> Jieqiang Wang
> 
> 
> 
> IMPORTANT NOTICE: The contents of this email and any attachments are
> confidential and may also be privileged. If you are not the intended
> recipient, please notify the sender immediately and do not disclose the
> contents to any other person, use it for any purpose, or store or copy the
> information in any medium. Thank you.
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#18489): https://lists.fd.io/g/vpp-dev/message/18489
Mute This Topic: https://lists.fd.io/mt/79516636/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to