https://bugs.dpdk.org/show_bug.cgi?id=382
Bug ID: 382 Summary: rte_eth: rx/tx callbacks invoked without lock protection Product: DPDK Version: 18.11 Hardware: All OS: All Status: UNCONFIRMED Severity: normal Priority: Normal Component: ethdev Assignee: dev@dpdk.org Reporter: zhongdahulin...@163.com Target Milestone: --- Hi, all I launch my DPDK app, and then use dpdk-pdump to capture wire packets. When I stop dpdk-pdump with ctrl+c, my DPDK app crash. Here is the coredump backtrace: ``` [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `/usr/sbin/dpvs -- -l 0,1,2,3,4,5,6,7,8 -w 0000:04:00.0 -w 0000:04:00.1 --legacy'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00005555558d24e3 in rte_eth_rx_burst (port_id=1, queue_id=2, rx_pkts=0x55555721ed10 <lcore_conf+2386256>, nb_pkts=32) at /builds/lbc/kigw/kigw_scripts/dpdk-stable-18.11.2/build/include/rte_ethdev.h:3888 3888 /builds/lbc/kigw/kigw_scripts/dpdk-stable-18.11.2/build/include/rte_ethdev.h: No such file or directory. [Current thread is 1 (Thread 0x7ffff3d36700 (LWP 29228))] (gdb) bt #0 0x00005555558d24e3 in rte_eth_rx_burst (port_id=1, queue_id=2, rx_pkts=0x55555721ed10 <lcore_conf+2386256>, nb_pkts=32) at /builds/lbc/kigw/kigw_scripts/dpdk-stable-18.11.2/build/include/rte_ethdev.h:3888 #1 0x00005555558d8743 in netif_rx_burst (pid=1, qconf=0x55555721ed00 <lcore_conf+2386240>) at /builds/lbc/kigw/src/netif.c:1618 #2 0x00005555558dc331 in lcore_job_recv_fwd (arg=0x0) at /builds/lbc/kigw/src/netif.c:2424 #3 0x00005555558e16de in do_lcore_job (job=0x555555f6f980 <netif_jobs>) at /builds/lbc/kigw/src/netif.c:4265 #4 0x00005555558e184d in netif_loop (dummy=0x0) at /builds/lbc/kigw/src/netif.c:4307 #5 0x00005555556cfb9d in eal_thread_loop () #6 0x00007ffff67744a4 in start_thread (arg=0x7ffff3d36700) at pthread_create.c:456 #7 0x00007ffff62abd0f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97 ``` I followed the source code in rte_ethdev.h where the segmentation fault occurred in line 3888: ``` static inline uint16_t rte_eth_rx_burst(uint16_t port_id, uint16_t queue_id, struct rte_mbuf **rx_pkts, const uint16_t nb_pkts) { struct rte_eth_dev *dev = &rte_eth_devices[port_id]; uint16_t nb_rx; #ifdef RTE_LIBRTE_ETHDEV_DEBUG RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, 0); RTE_FUNC_PTR_OR_ERR_RET(*dev->rx_pkt_burst, 0); if (queue_id >= dev->data->nb_rx_queues) { RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n", queue_id); return 0; } #endif nb_rx = (*dev->rx_pkt_burst)(dev->data->rx_queues[queue_id], rx_pkts, nb_pkts); #ifdef RTE_ETHDEV_RXTX_CALLBACKS if (unlikely(dev->post_rx_burst_cbs[queue_id] != NULL)) { struct rte_eth_rxtx_callback *cb = dev->post_rx_burst_cbs[queue_id]; do { nb_rx = cb->fn.rx(port_id, queue_id, rx_pkts, nb_rx, nb_pkts, cb->param); cb = cb->next; } while (cb != NULL); } #endif return nb_rx; } ``` Found that callback list doesn't protected by any lock. These may have concurrent issues when another process like dpdk-pdump unregister tx/rx callbacks before it exits. The registration/unregistration of tx/rx callbacks hold a spinlock. I wonder why there is no lock in invocation of callback list. ``` int rte_eth_remove_rx_callback(uint16_t port_id, uint16_t queue_id, const struct rte_eth_rxtx_callback *user_cb) { #ifndef RTE_ETHDEV_RXTX_CALLBACKS return -ENOTSUP; #endif /* Check input parameters. */ RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL); if (user_cb == NULL || queue_id >= rte_eth_devices[port_id].data->nb_rx_queues) return -EINVAL; struct rte_eth_dev *dev = &rte_eth_devices[port_id]; struct rte_eth_rxtx_callback *cb; struct rte_eth_rxtx_callback **prev_cb; int ret = -EINVAL; rte_spinlock_lock(&rte_eth_rx_cb_lock); prev_cb = &dev->post_rx_burst_cbs[queue_id]; for (; *prev_cb != NULL; prev_cb = &cb->next) { cb = *prev_cb; if (cb == user_cb) { /* Remove the user cb from the callback list. */ *prev_cb = cb->next; ret = 0; break; } } rte_spinlock_unlock(&rte_eth_rx_cb_lock); return ret; } ``` Please check and conform if this is a bug. Thanks, Linfan Hu -- You are receiving this mail because: You are the assignee for the bug.