https://bugs.dpdk.org/show_bug.cgi?id=1086
Bug ID: 1086 Summary: Significant TX packet drops with Mellanox NIC (mlx5 PMD) Product: DPDK Version: 21.11 Hardware: x86 OS: Linux Status: UNCONFIRMED Severity: critical Priority: Normal Component: ethdev Assignee: dev@dpdk.org Reporter: an...@vaa.su Target Milestone: --- Created attachment 222 --> https://bugs.dpdk.org/attachment.cgi?id=222&action=edit testpmd-fec28ca0e3.log.txt Given 2 servers with 25G Mellanox 2-port NICs: # dpdk-devbind.py -s Network devices using kernel driver =================================== 0000:3b:00.0 'MT27710 Family [ConnectX-4 Lx] 1015' if=ens1f0np0 drv=mlx5_core unused=vfio-pci 0000:3b:00.1 'MT27710 Family [ConnectX-4 Lx] 1015' if=ens1f1np1 drv=mlx5_core unused=vfio-pci Servers are connected directly. The first server is used as a packet generator, running TRex v2.99 in stateless mode: ./t-rex-64 -c 16 -i ./trex-console trex>start -f stl/udp_1pkt_range_clients.py -m 17mpps The second one runs dpdk-testpmd: OS: Debian GNU/Linux 10 (buster) uname -r: 4.19.0-21-amd64 ofed_info: MLNX_OFED_LINUX-5.7-1.0.2.0 gcc version 8.3.0 (Debian 8.3.0-6) When compiled DPDK v21.08 and running testpmd this way: dpdk-testpmd -l 1-17 -n 4 --log-level=debug -- --nb-ports=2 --nb-cores=16 --portmask=0x3 --rxq=8 --txq=8 It handles roughly 17Mpps per port: trex>start -f stl/udp_1pkt_range_clients.py -m 17mpps TRex Port Statistics port | 0 | 1 | total -----------+-------------------+-------------------+------------------ owner | root | root | link | UP | UP | state | TRANSMITTING | TRANSMITTING | speed | 25 Gb/s | 25 Gb/s | CPU util. | 27.76% | 27.76% | -- | | | Tx bps L2 | 8.7 Gbps | 8.73 Gbps | 17.43 Gbps Tx bps L1 | 11.42 Gbps | 11.46 Gbps | 22.88 Gbps Tx pps | 17 Mpps | 17.05 Mpps | 34.05 Mpps Line Util. | 45.7 % | 45.83 % | --- | | | Rx bps | 8.7 Gbps | 8.73 Gbps | 17.43 Gbps Rx pps | 17 Mpps | 17.05 Mpps | 34.05 Mpps ---- | | | opackets | 290928398 | 291050836 | 581979234 ipackets | 290885740 | 291093159 | 581978899 obytes | 18619417472 | 18627254464 | 37246671936 ibytes | 18616688080 | 18629962836 | 37246650916 tx-pkts | 290.93 Mpkts | 291.05 Mpkts | 581.98 Mpkts rx-pkts | 290.89 Mpkts | 291.09 Mpkts | 581.98 Mpkts tx-bytes | 18.62 GB | 18.63 GB | 37.25 GB rx-bytes | 18.62 GB | 18.63 GB | 37.25 GB ----- | | | oerrors | 0 | 0 | 0 ierrors | 0 | 0 | 0 But if we switch to DPDK v21.11, it becomes much worse: TRex Port Statistics port | 0 | 1 | total -----------+-------------------+-------------------+------------------ owner | root | root | link | UP | UP | state | TRANSMITTING | TRANSMITTING | speed | 25 Gb/s | 25 Gb/s | CPU util. | 26.06% | 26.06% | -- | | | Tx bps L2 | 8.7 Gbps | 8.72 Gbps | 17.42 Gbps Tx bps L1 | 11.42 Gbps | 11.45 Gbps | 22.86 Gbps Tx pps | 16.99 Mpps | 17.04 Mpps | 34.02 Mpps Line Util. | 45.66 % | 45.79 % | --- | | | Rx bps | 3.75 Gbps | 3.76 Gbps | 7.5 Gbps Rx pps | 7.32 Mpps | 7.34 Mpps | 14.66 Mpps ---- | | | opackets | 190538147 | 190707494 | 381245641 ipackets | 82174700 | 82260152 | 164434852 obytes | 12194441408 | 12205280936 | 24399722344 ibytes | 5259181520 | 5264649728 | 10523831248 tx-pkts | 190.54 Mpkts | 190.71 Mpkts | 381.25 Mpkts rx-pkts | 82.17 Mpkts | 82.26 Mpkts | 164.43 Mpkts tx-bytes | 12.19 GB | 12.21 GB | 24.4 GB rx-bytes | 5.26 GB | 5.26 GB | 10.52 GB ----- | | | oerrors | 0 | 0 | 0 ierrors | 0 | 0 | 0 It handles only ~7 Mpps for each port, instead of ~17 Mpps! There are huge TX drops stats reported by testpmd: ---------------------- Forward statistics for port 0 ---------------------- RX-packets: 1101378001 RX-dropped: 0 RX-total: 1101378001 TX-packets: 1016776861 TX-dropped: 84576754 TX-total: 1101353615 ---------------------------------------------------------------------------- ---------------------- Forward statistics for port 1 ---------------------- RX-packets: 1101353615 RX-dropped: 0 RX-total: 1101353615 TX-packets: 1016804108 TX-dropped: 84573893 TX-total: 1101378001 ---------------------------------------------------------------------------- +++++++++++++++ Accumulated forward statistics for all ports+++++++++++++++ RX-packets: 2202731616 RX-dropped: 0 RX-total: 2202731616 TX-packets: 2033580969 TX-dropped: 169150647 TX-total: 2202731616 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ I found the commit (between 21.08 and 21.11), which caused this trouble using git bisect: https://github.com/DPDK/dpdk/commit/fec28ca0e3a93143829f3b41a28a8da933f28499 Also, I've used to profile it with Intel VTune 2021.3.0 (-collect hotspots & -collect memory-access). I've compared two revisions: 1. 690b2a88c2 (GOOD) 2. fec28ca0e3 (BAD) I may try to share corresponding profiling results somehow if it helps. Unfortunately, I cannot attach them here (vtune stats data is too big). -- You are receiving this mail because: You are the assignee for the bug.