https://bugs.dpdk.org/show_bug.cgi?id=334
Bug ID: 334 Summary: ConnectX-4/mlx5 crashes under high load in rxq_cq_decompress_v() Product: DPDK Version: 18.11 Hardware: x86 OS: Linux Status: UNCONFIRMED Severity: normal Priority: Normal Component: ethdev Assignee: dev@dpdk.org Reporter: y...@nttv6.jp Target Milestone: --- I'm writing my own DPDK application and it gets a crash in the mlx5 driver function. It doesn't crash under 10Gbps load but does under 50Gbps load (or higher, 90Gbps was tested and resulted in a similar crash). (both load are for a 100GbE port.) 4 cores (4 rxqs, 1-to-1) were assigned for the port. 48 txqs were assigned for the port. The port's device is: Mellanox Technologies MT27700 Family [ConnectX-4] MLNX_OFED_LINUX-4.5-1.0.1.0-ubuntu18.04-x86_64 in Ubuntu 18.04.2 LTS 4.15.0-50-generic $ sudo mstflint -d 86:00.0 q Image type: FS3 FW Version: 12.17.2020 FW Release Date: 22.11.2016 Description: UID GuidsNumber Base GUID: N/A 4 Base MAC: 00900b65b390 4 Orig Base MAC: N/A 4 Image VSD: N/A Device VSD: N/A PSID: LNR3270110033 Security Attributes: N/A (Couldn't update the firmware because of the PSID.) The backtrace of the crash: Thread 11 "lcore-slave-8" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7ffff1517700 (LWP 30617)] 0x0000555555f0230a in _mm_storeu_si128 (__B=..., __P=0x10) at /usr/lib/gcc/x86_64-linux-gnu/7/include/emmintrin.h:721 721 *__P = __B; (gdb) bt #0 0x0000555555f0230a in _mm_storeu_si128 (__B=..., __P=0x10) at /usr/lib/gcc/x86_64-linux-gnu/7/include/emmintrin.h:721 #1 rxq_cq_decompress_v (rxq=0x1c0da7480, cq=0x1c0c8cb80, elts=0x1c0da7a70) at /usr/local/dpdk-stable-18.11.2/drivers/net/mlx5/mlx5_rxtx_vec_sse.h:438 #2 0x0000555555f05b82 in rxq_burst_v (rxq=0x1c0da7480, pkts=0x7ffff1514a40, pkts_n=32, err=0x7ffff1505978) at /usr/local/dpdk-stable-18.11.2/drivers/net/mlx5/mlx5_rxtx_vec_sse.h:956 #3 0x0000555555f0662a in mlx5_rx_burst_vec (dpdk_rxq=0x1c0da7480, pkts=0x7ffff1514a40, pkts_n=32) at /usr/local/dpdk-stable-18.11.2/drivers/net/mlx5/mlx5_rxtx_vec.c:238 #4 0x000055555563304d in rte_eth_rx_burst (port_id=0, queue_id=0, rx_pkts=0x7ffff1514a40, nb_pkts=32) at /usr/local/dpdk-18.11/x86_64-native-linuxapp-gcc/include/rte_ethdev.h:3879 (our DPDK application functions follow.) It reproduces always. The same happened in DPDK 19.05.0. When the crash occurs, in frame 1: rxq_cq_decompress_v(): (gdb) p t_pkt->data_len $1 = 124 (gdb) p mcqe_n $2 = 124 (gdb) p pos $3 = 116 (gdb) p elts[pos + 3] $10 = (struct rte_mbuf *) 0x0 It seems sometimes something is wrong in the initialization of struct rte_mbuf *elts[]. (gdb) p/x (void*[124])elts[0] $4 = {0x1e0106fc0, 0x1e00a8880, 0x1de8b4580, 0x1de716340, 0x1e00ad600, 0x1dfcc04c0, 0x1decf89c0, 0x1df656440, 0x1e02fc500, 0x1df7303c0, 0x1df7876c0, 0x1df0adfc0, 0x1dc44a8c0, 0x1dfb55040, 0x1df4b3480, 0x1ded87800, 0x1e07a64c0, 0x1dec066c0, 0x1dc59d9c0, 0x1de3ae540, 0x1debaf3c0, 0x1dfd69d40, 0x1dfd36f80, 0x1df073dc0, 0x1dffb3ec0, 0x1df0d7280, 0x1e0235b80, 0x1de4b3e40, 0x1df925900, 0x1df421f80, 0x1df021840, 0x1dfab7980, 0x1dfe572c0, 0x1dea3cb00, 0x1dbf5a540, 0x1de10aa00, 0x1dded8c00, 0x1df87c080, 0x1dee80f40, 0x1df596f00, 0x1dff20300, 0x1e05a4dc0, 0x1e0182800, 0x1e0257a00, 0x1e0323100, 0x1e0f3f100, 0x1df5ff140, 0x1dfbe17c0, 0x1de2c3680, 0x1dfd54080, 0x1de18afc0, 0x1dd81ecc0, 0x1de1f7f80, 0x1ded09900, 0x1df35b600, 0x1de57f540, 0x1df9e4e40, 0x1e0747d80, 0x1e024df00, 0x1ddf2d840, 0x1df95ad80, 0x1dedf47c0, 0x1de1ebdc0, 0x1e00e9ec0, 0x1e02febc0, 0x1dae22840, 0x1e051d3c0, 0x1df46f780, 0x1e0353800, 0x1e0ceb480, 0x1dfe9fd40, 0x1db58d440, 0x1e0526ec0, 0x1d61ebe40, 0x1dfe85300, 0x1df3b4fc0, 0x1ddbc0cc0, 0x1e04823c0, 0x1df724200, 0x1df9cf180, 0x1dfeb0c80, 0x1df4fe5c0, 0x1dff0f3c0, 0x1e051fa80, 0x1dd81c600, 0x1ddcef880, 0x1de30e7c0, 0x1ded803c0, 0x1de51e740, 0x1deffac40, 0x1df533a40, 0x1dd399240, 0x1deccf700, 0x1dfefbdc0, 0x1de9ab600, 0x1e0502980, 0x1dfb52980, 0x1dedabd40, 0x1e07e5440, 0x1dea91740, 0x1dd749ac0, 0x1e0d1e240, 0x1df86b140, 0x1df9013c0, 0x1dfc31680, 0x1dfa15540, 0x1e03694c0, 0x1e06dfb40, 0x1dfdf8b80, 0x1ddd8cf40, 0x1e03086c0, 0x1c0da2380, 0x1c0da2380, 0x1c0da2380, 0x1c0da2380, 0x1c0da2380, 0x1c0da2380, 0x1c0da2380, 0x1c0da2380, 0x0, 0x0, 0x0, 0x0, 0x7ffff7ff487c} FYI: The same DPDK application works fine on DPDK-18.11.2, for Mellanox Technologies MT27800 Family [ConnectX-5] with below firmware, even in the high load. MLNX_OFED_LINUX-4.5-1.0.1.0-ubuntu18.04-x86_64.tgz in Ubuntu 18.04.2 LTS 4.15.0-54-generic. # mstflint -d 3b:00.0 q Image type: FS4 FW Version: 16.24.1000 FW Release Date: 26.11.2018 Product Version: 16.24.1000 Rom Info: type=UEFI version=14.17.11 cpu=AMD64 type=PXE version=3.5.603 cpu=AMD64 Description: UID GuidsNumber Base GUID: 506b4b0300086c56 8 Base MAC: 506b4b086c56 8 Image VSD: N/A Device VSD: N/A PSID: MT_0000000008 Security Attributes: N/A -- You are receiving this mail because: You are the assignee for the bug.