Hi VPP dev,

I was trying to do some benchmarking on VPP and found out no-multi-seg option 
in startup.conf will have impact on both the performance and how the runtime 
shows.
The VPP version is v21.01-rc0~547-gf0419a0c8, the DPDK version is DPDK 20.11.0.
With no-multi-seg option set in the startup.conf, the runtime shows like the 
following:
Thread 1 vpp_wk_0 (lcore 2)
Time 1.1, 10 sec internal node vector rate 85.33 loops/sec 97035.37
  vector rates in 1.2537e7, out 1.2537e7, drop 0.0000e0, punt 0.0000e0
             Name                 State         Calls          Vectors        
Suspends         Clocks       Vectors/Call
dpdk-input                       polling            112527        14403456      
         0         9.34e-1          128.00
eth0-output                      active             112527         7201728      
         0         2.04e-1           64.00
eth0-tx                          active             112527         7201728      
         0         7.86e-1           64.00
eth1-output                      active             112527         7201728      
         0         1.91e-1           64.00
eth1-tx                          active             112527         7201728      
         0         7.93e-1           64.00
ethernet-input                   active             225054        14403456      
         0         5.65e-1           64.00
ip4-input-no-checksum            active             112527        14403456      
         0         3.83e-1          128.00
ip4-lookup                       active             112527        14403456      
         0         5.34e-1          128.00
ip4-rewrite                      active             112527        14403456      
         0         5.73e-1          128.00
unix-epoll-input                 polling               110               0      
         0          2.84e1            0.00

Output for command 'show hardware-interfaces':
vpp# sh hardware-interfaces
              Name                Idx   Link  Hardware
eth0                               1     up   eth0
  Link speed: 40 Gbps
  Ethernet address 3c:fd:fe:bb:d4:10
  Intel X710/XL710 Family
    carrier up full duplex mtu 9206
    flags: admin-up pmd rx-ip4-cksum
    Devargs:
    rx: queues 1 (max 320), desc 1024 (min 64 max 4096 align 32)
    tx: queues 2 (max 320), desc 1024 (min 64 max 4096 align 32)
    pci: device 8086:1583 subsystem 8086:0001 address 0001:01:00.00 numa 0
    max rx packet len: 9728
    promiscuous: unicast off all-multicast on
    vlan offload: strip off filter off qinq off
    rx offload avail:  vlan-strip ipv4-cksum udp-cksum tcp-cksum qinq-strip
                       outer-ipv4-cksum vlan-filter vlan-extend jumbo-frame
                       scatter keep-crc rss-hash
    rx offload active: ipv4-cksum
    tx offload avail:  vlan-insert ipv4-cksum udp-cksum tcp-cksum sctp-cksum
                       tcp-tso outer-ipv4-cksum qinq-insert vxlan-tnl-tso
                       gre-tnl-tso ipip-tnl-tso geneve-tnl-tso multi-segs
                       mbuf-fast-free
    tx offload active: none
    rss avail:         ipv4-frag ipv4-tcp ipv4-udp ipv4-sctp ipv4-other 
ipv6-frag
                       ipv6-tcp ipv6-udp ipv6-sctp ipv6-other l2-payload
    rss active:        none
    tx burst mode: Vector Neon
    rx burst mode: Vector Neon

Without no-mutli-seg option in startup.conf, the runtime shows as below:
Thread 1 vpp_wk_0 (lcore 2)
Time 1.7, 10 sec internal node vector rate 256.00 loops/sec 19628.70
  vector rates in 1.0186e7, out 1.0186e7, drop 0.0000e0, punt 0.0000e0
             Name                 State         Calls          Vectors        
Suspends         Clocks       Vectors/Call
dpdk-input                       polling             34157        17488384      
         0         9.51e-1          512.00
eth0-output                      active              34157         8744192      
         0         1.66e-1          256.00
eth0-tx                          active              34157         8744192      
         0          1.84e0          256.00
eth1-output                      active              34157         8744192      
         0         1.71e-1          256.00
eth1-tx                          active              34157         8744192      
         0          1.88e0          256.00
ethernet-input                   active              68314        17488384      
         0         4.60e-1          256.00
ip4-input-no-checksum            active              68314        17488384      
         0         3.58e-1          256.00
ip4-lookup                       active              68314        17488384      
         0         5.29e-1          256.00
ip4-rewrite                      active              68314        17488384      
         0         5.78e-1          256.00
unix-epoll-input                 polling                33               0      
         0          3.39e1            0.00

Output for command 'show hardware-interfaces':
vpp# sh hardware-interfaces
              Name                Idx   Link  Hardware
eth0                               1     up   eth0
  Link speed: 40 Gbps
  Ethernet address 3c:fd:fe:bb:d4:10
  Intel X710/XL710 Family
    carrier up full duplex mtu 9206
    flags: admin-up pmd maybe-multiseg rx-ip4-cksum
    Devargs:
    rx: queues 1 (max 320), desc 1024 (min 64 max 4096 align 32)
    tx: queues 2 (max 320), desc 1024 (min 64 max 4096 align 32)
    pci: device 8086:1583 subsystem 8086:0001 address 0001:01:00.00 numa 0
    max rx packet len: 9728
    promiscuous: unicast off all-multicast on
    vlan offload: strip off filter off qinq off
    rx offload avail:  vlan-strip ipv4-cksum udp-cksum tcp-cksum qinq-strip
                       outer-ipv4-cksum vlan-filter vlan-extend jumbo-frame
                       scatter keep-crc rss-hash
    rx offload active: ipv4-cksum jumbo-frame scatter
    tx offload avail:  vlan-insert ipv4-cksum udp-cksum tcp-cksum sctp-cksum
                       tcp-tso outer-ipv4-cksum qinq-insert vxlan-tnl-tso
                       gre-tnl-tso ipip-tnl-tso geneve-tnl-tso multi-segs
                       mbuf-fast-free
    tx offload active: multi-segs
    rss avail:         ipv4-frag ipv4-tcp ipv4-udp ipv4-sctp ipv4-other 
ipv6-frag
                       ipv6-tcp ipv6-udp ipv6-sctp ipv6-other l2-payload
    rss active:        none
    tx burst mode: Scalar
    rx burst mode: Vector Neon Scattered

So I am wondering why no-multi-seg option will change the vector rates 
highlight in red above? Is this phenomenon expected?
I also saw performance drop when no-multi-seg option was not set in 
startup.conf when I sent small packets(like 64 bytes) as traffic input(simple 
IPv4 routing test case with 1 flow). How will VPP behave differently if 
no-multi-seg option is set?

Look forward to getting your feedback.

Thanks,
Jieqiang Wang

IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#18488): https://lists.fd.io/g/vpp-dev/message/18488
Mute This Topic: https://lists.fd.io/mt/79516636/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to