Starting and stopping a tap device results in the following output: tap_lsc_intr_handle_set(): intr callback unregister failed: -2 free(): invalid pointer and a core dump is generated due to abort() being called.
Although the stack backtrace below gives line numbers for dpdk-21.11, this problem still occurs with the current HEAD of the development tree (commit 7615ec581). The stack backtrace is: #0 0x00007f80741d12a2 in raise () from /lib64/libc.so.6 #1 0x00007f80741ba8a4 in abort () from /lib64/libc.so.6 #2 0x00007f8074213ac7 in __libc_message () from /lib64/libc.so.6 #3 0x00007f807421b73c in malloc_printerr () from /lib64/libc.so.6 #4 0x00007f807421c97c in _int_free () from /lib64/libc.so.6 #5 0x00007f80742207a8 in free () from /lib64/libc.so.6 #6 0x00007f8074374bf5 in rte_intr_instance_free (intr_handle=intr_handle@entry=0x1003b2480) at ../lib/eal/common/eal_common_interrupts.c:184 #7 0x00007f8072bf16ce in tap_rx_intr_vec_uninstall (dev=dev@entry=0x7f80744ed480 <rte_eth_devices>) at ../drivers/net/tap/tap_intr.c:38 #8 0x00007f8072bf18d3 in tap_rx_intr_vec_set (dev=dev@entry=0x7f80744ed480 <rte_eth_devices>, set=set@entry=0) at ../drivers/net/tap/tap_intr.c:111 #9 0x00007f8072bea742 in tap_intr_handle_set (dev=dev@entry=0x7f80744ed480 <rte_eth_devices>, set=set@entry=0) at ../drivers/net/tap/rte_eth_tap.c:1727 #10 0x00007f8072bea7d0 in tap_dev_stop (dev=0x7f80744ed480 <rte_eth_devices>) at ../drivers/net/tap/rte_eth_tap.c:916 #11 0x00007f80744875a4 in rte_eth_dev_stop (port_id=<optimized out>) at ../lib/ethdev/rte_ethdev.c:1883 #12 0x000000000040158b in main (argc=4, argv=0x7ffd13cfc368) at tap_free.c:59 A sample program to demonstrate the problem is ======================================================================= // Run as: build/tap_free --vdev=net_tap0,remote=PORT -l 0,1 #include <stdio.h> #include <rte_eal.h> #include <rte_ethdev.h> #include <rte_mbuf.h> int main(int argc, char *argv[]) { uint16_t port_id; struct rte_mempool *mbuf_pool; struct rte_eth_conf port_conf; struct rte_eth_dev_info dev_info; uint16_t nb_rxd = 1024; uint16_t nb_txd = 1024; struct rte_eth_txconf txconf; if (rte_eal_init(argc, argv) < 0) rte_exit(EXIT_FAILURE, "Error with EAL initialization\n"); if (rte_eth_dev_count_avail() < 1) rte_exit(EXIT_FAILURE, "Error: should have at least 1 port\n"); port_id = rte_eth_find_next_owned_by(0, RTE_ETH_DEV_NO_OWNER); mbuf_pool = rte_pktmbuf_pool_create("mbuf_pool", 1023, 256, 0, RTE_MBUF_DEFAULT_BUF_SIZE, rte_socket_id()); if (!rte_eth_dev_is_valid_port(port_id)) rte_exit(1, "a\n"); memset(&port_conf, 0, sizeof(struct rte_eth_conf)); if (rte_eth_dev_info_get(port_id, &dev_info)) rte_exit(1, "b\n"); if (dev_info.tx_offload_capa & RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE) port_conf.txmode.offloads |= RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE; if (rte_eth_dev_configure(port_id, 1, 1, &port_conf)) rte_exit(1, "c\n"); if (rte_eth_dev_adjust_nb_rx_tx_desc(port_id, &nb_rxd, &nb_txd)) rte_exit(1, "d\n"); if (rte_eth_rx_queue_setup(port_id, 0, nb_rxd, rte_eth_dev_socket_id(port_id), NULL, mbuf_pool) < 0) rte_exit(1, "e\n"); txconf = dev_info.default_txconf; txconf.offloads = port_conf.txmode.offloads; if (rte_eth_tx_queue_setup(port_id, 0, nb_txd, rte_eth_dev_socket_id(port_id), &txconf) < 0) rte_exit(1, "f\n"); if (rte_eth_dev_start(port_id) < 0) rte_exit(1, "g\n"); printf("Calling rte_eth_dev_stop - will error without patch\n"); fflush(stdout); rte_eth_dev_stop(port_id); printf("Returned from rte_eth_dev_stop\n"); fflush(stdout); rte_eth_dev_close(port_id); rte_eal_cleanup(); } ======================================= This problem is caused by tap_tx_intr_vec_uninstall() calling rte_intr_instance_free() which frees pmd->intr_handle (and it doesn't set pmd->intr_handle to NULL, although this is not the cause of the issue). When tap_rx_intr_vec_uninstall() is called from tap_rx_intr_vec_set() with the set parameter != 0, which occurs when rte_eth_dev_start() is called, it frees pmd->intr_handle, and tap_rx_intr_vec_install() is subsequently called. If intr_conf.rxq is not set, this does not cause an immediate problem, but if it is set, it will write to (the now) unallocated memory. The main problem occurs when tap_dev_stop() is called which in turn calls tap_intr_handle_set() and tap_lsc_intr_handle_set(). This uses pmd->intr_handle which has now been overwritten due to being previously freed. When rte_intr_instance_free() is called via tap_rx_intr_vec_uninstall(), due to intr_handle->alloc_flags having been overwritten by a subsequent user of that memory, it enters the wrong block and calls free() rather than rte_free(). This causes free() to call abort(). Quentin Armitage (1): tap: fix write-after-free and double free of intr_handle drivers/net/tap/rte_eth_tap.c | 5 +++++ drivers/net/tap/tap_intr.c | 2 -- 2 files changed, 5 insertions(+), 2 deletions(-) -- 2.34.1