The same goes with DPDK-19.05 too.

When crash happens,
mcqe_n == t_pkt->data_len == 124.

struct rte_mbuf **elts (which seems to be prepared somewhere)
looks like it's supposed to contain valid mbufs,
but (when under a significant load?) it doesn't.

(gdb) p/x (void*[124])elts[0]
$31 = {0x1d0bd0d80, 0x1d1cfef80, 0x1d28f6a40, 0x1d22eb100, 0x1d195a8c0, 
  0x1d2137200, 0x1d1eb5540, 0x1d1d0fec0, 0x1d28ecf40, 0x1d19b1bc0, 
  0x1cec8a200, 0x1d02e2980, 0x1d085cdc0, 0x1d04e8e00, 0x1ccb4e140, 
  0x1d1e17e80, 0x1d17a1c40, 0x1d14a6e00, 0x1d2871700, 0x1d20b6c40, 
  0x1d29831c0, 0x1d04941c0, 0x1d0921080, 0x1d070ea40, 0x1d148ea80, 
  0x1cee100c0, 0x1d1a47e40, 0x1d0ee6600, 0x1d02f1200, 0x1d24bc100, 
  0x1d1e84e40, 0x1d1e1f2c0, 0x1d28b7ac0, 0x1d2195940, 0x1d21bc540, 
  0x1d228f080, 0x1d1026100, 0x1d285e100, 0x1d211c7c0, 0x1d2128980, 
  0x1d1787200, 0x1d170e080, 0x1d1e0e380, 0x1ce638500, 0x1d21a6880, 
  0x1d20d8ac0, 0x1d25e8600, 0x1d2377880, 0x1d0e13ac0, 0x1c0c07100, 
  0x1c0c07100, 0x1c0c07100, 0x1c0c07100, 0x0, 0x0, 0x0, 0x0, 0x7ffff7ff487c, 
  0x1c0c06f00, 0x1c0c08b00, 0x0, 0x0, 0x7ffff7ff207c, 0x1, 0x1480, 
  0x140000000, 0x100000000, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 
  0x40000000ffffffff, 0x4000000000000001, 0xcd64010000000002, 
  0x0 <repeats 46 times>}

(gdb) p elts[48]
$38 = (struct rte_mbuf *) 0x1d0e13ac0
(gdb) p elts[49]
$39 = (struct rte_mbuf *) 0x1c0c07100
(gdb) p elts[50]
$40 = (struct rte_mbuf *) 0x1c0c07100
(gdb) p elts[51]
$41 = (struct rte_mbuf *) 0x1c0c07100
(gdb) p elts[52]
$42 = (struct rte_mbuf *) 0x1c0c07100
(gdb) p elts[53]
$43 = (struct rte_mbuf *) 0x0

Any thoughts?

regards,
Yasu

From: Yasuhiro Ohara <y...@nttv6.jp>
Subject: [dpdk-dev] ConnectX-4/mlx5 crashes around rxq_cqe_comp_en?
Date: Sat, 13 Jul 2019 01:38:53 +0900 (JST)
Message-ID: <20190713.013853.751044529514409504.y...@nttv6.jp>

> 
> Hi,
> 
> I get a crash when I put a significant amount of load on ConnectX-4/mlx5,
> i.e., 50Gbps for 100GbE port.
> 
> Thread 22 "lcore-slave-19" received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7fffe77ee700 (LWP 33519)]
> 0x0000555555f010a3 in _mm_storeu_si128 (__B=..., __P=0x10)
>     at /usr/lib/gcc/x86_64-linux-gnu/7/include/emmintrin.h:721
> 721       *__P = __B;
> (gdb) bt
> #0  0x0000555555f010a3 in _mm_storeu_si128 (__B=..., __P=0x10)
>     at /usr/lib/gcc/x86_64-linux-gnu/7/include/emmintrin.h:721
> #1  rxq_cq_decompress_v (rxq=0x22c910ccc0, cq=0x22c8fd1800, elts=0x22c910d240)
>     at /usr/local/dpdk-stable-18.11.2/drivers/net/mlx5/mlx5_rxtx_vec_sse.h:421
> #2  0x0000555555f04b42 in rxq_burst_v (rxq=0x22c910ccc0, pkts=0x7fffe77eba40, 
>     pkts_n=32, err=0x7fffe77dc978)
>     at /usr/local/dpdk-stable-18.11.2/drivers/net/mlx5/mlx5_rxtx_vec_sse.h:956
> #3  0x0000555555f055ea in mlx5_rx_burst_vec (dpdk_rxq=0x22c910ccc0, 
>     pkts=0x7fffe77eba40, pkts_n=32)
>     at /usr/local/dpdk-stable-18.11.2/drivers/net/mlx5/mlx5_rxtx_vec.c:238
> #4  0x0000555555632772 in rte_eth_rx_burst (port_id=4, queue_id=5, 
>     rx_pkts=0x7fffe77eba40, nb_pkts=32)
>     at 
> /usr/local/dpdk-18.11/x86_64-native-linuxapp-gcc/include/rte_ethdev.h:3879
> 
> My environments are:
> 
> Ubuntu 18.04.2 LTS 4.15.0-50-generic
> MLNX_OFED_LINUX-4.5-1.0.1.0-ubuntu18.04-x86_64
> fw_ver: 12.17.2020
> vendor_id: 0x02c9
> vendor_part_id: 4115
> hw_ver: 0x0
> board_id: LNR3270110033
> DPDK 18.11.2
> 
> It looks like the CQE compression is the crashing place.
> 
> dpdk-stable-18.11.2/drivers/net/mlx5/mlx5_rxtx_vec_sse.h:956
> 953         /* Decompress the last CQE if compressed. */
> 954         if (comp_idx < MLX5_VPMD_DESCS_PER_LOOP && comp_idx == n) {
> 955                 assert(comp_idx == (nocmp_n % MLX5_VPMD_DESCS_PER_LOOP));
> 956                 rxq_cq_decompress_v(rxq, &cq[nocmp_n], &elts[nocmp_n]);
> 
> And I'm wondering how I can disable rxq_cqe_comp_en devargs.
> 
> <https://doc.dpdk.org/guides-18.02/nics/mlx5.html>
> 22.5.3. Run-time configuration
> rxq_cqe_comp_en parameter [int]
> 
> Any information or guesses are appreciated.
> 
> Best regards,
> Yasu
> 

Reply via email to