Hi all,
struct rte_eth_dev_data has a member named dev_private and another named mac_addrs, as shown below: struct rte_eth_dev_data { ... void *dev_private; /**< PMD-specific private data. * @see rte_eth_dev_release_port() */ struct ether_addr *mac_addrs; /**< Device Ethernet link address. * @see rte_eth_dev_release_port() */ ... }; Some drivers like mlx5 implements mac_addrs as part of dev_private: static struct rte_eth_dev * mlx5_dev_spawn(struct rte_device *dpdk_dev, struct ibv_device *ibv_dev, struct mlx5_dev_config config, const struct mlx5_switch_info *switch_info) { ... eth_dev->data->mac_addrs = priv->mac; ... } I don't think it's a good code habit, cause this may have potential issues while freeing dev_private and/or mac_addrs., even though they commented: '/* mac_addrs must not be freed because part of dev_private */" eth_dev->data->mac_addrs = NULL; It's all good when things all done in primary process. But if this invoked in secondary process, it'll cause primary crash. In my test environment I have two Mellanox ports bonded with 802.3ad mode. And my dpdk-app enabled with rte_pdump. After I started my dpdk-app, I launch dpdk-dump to capture packets. Then my dpdk-app crash Here is the backtrace: (gdb) bt #0 0x00005555556a477b in rte_eth_macaddr_get () #1 0x0000555555699cbc in bond_mode_8023ad_periodic_cb () #2 0x00005555556d836c in eal_alarm_callback () #3 0x00005555556d68a2 in eal_intr_thread_main () #4 0x00007ffff67744a4 in start_thread (arg=0x7ffff5d3a700) at pthread_create.c:456 #5 0x00007ffff62abd0f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97 Here is dpdk-pdump's backtrace: (gdb) r Starting program: /home/hulinfan/dpdk-stable-18.11.2/build/app/dpdk-pdump -- --pdump 'port=0,queue=*,rx-dev=eth7,tx-dev=eth7' [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". EAL: Detected 40 lcore(s) EAL: Detected 2 NUMA nodes pathname for rte_mem_config: /var/run/dpdk/rte/config [New Thread 0x7ffff61d3700 (LWP 5508)] EAL: Multi-process socket /var/run/dpdk/rte/mp_socket_5507_a3efe2a87b368c [New Thread 0x7ffff59d2700 (LWP 5509)] EAL: Probing VFIO support... EAL: VFIO support initialized EAL: PCI device 0000:01:00.0 on NUMA socket 0 EAL: probe driver: 8086:10fb net_ixgbe EAL: PCI device 0000:01:00.1 on NUMA socket 0 EAL: probe driver: 8086:10fb net_ixgbe EAL: PCI device 0000:04:00.0 on NUMA socket 0 EAL: probe driver: 15b3:1015 net_mlx5 net_mlx5: port 0 UAR address 0x7ffef4538000 size 4294967296 occupied, please adjust MLX5_UAR_OFFSET or try EAL parameter --base-virtaddr Thread 1 "dpdk-pdump" hit Breakpoint 4, mlx5_dev_spawn (dpdk_dev=dpdk_dev@entry=0x5555563d84e0, ibv_dev=<optimized out>, config=..., switch_info=switch_info@entry=0x7fffffffe5a8) at /home/hulinfan/dpdk-stable-18.11.2/drivers/net/mlx5/mlx5.c:1279 1279eth_dev->data->mac_addrs = NULL; (gdb) l 1274} Let me try to explain: 1) Secondary (like dpdk-pdump) init eal environment with rte_eal_init(); 2) Probe all the buses and devices/drivers with rte_bus_probe(); 3) the pci probe handler mlx5_pci_probe() is called; 4) mlx5_dev_spawn() is called 5) mlx5_uar_init_secondary() is called and failed 6) go out and clean resources, setting eth_dev->data->mac_addrs = NULL; 7) Primary's alarm callback bond_mode_8023ad_periodic_cb() try to get mac addrs with rte_eth_macaddr_get (), then crashed. Please comfirm whether the following drivers have the same bug: ---- part of dev_private Matches (12 in 9 files) ---- fs_eth_dev_create in failsafe.c (drivers\net\failsafe) : /* mac_addrs must not be freed alone because part of dev_private */ fs_rte_eth_free in failsafe.c (drivers\net\failsafe) : /* mac_addrs must not be freed alone because part of dev_private */ eth_dev_vmbus_release in hn_ethdev.c (drivers\net\netvsc) : /* mac_addrs must not be freed alone because part of dev_private */ mlx4_pci_probe in mlx4.c (drivers\net\mlx4) : /* mac_addrs must not be freed because part of dev_private */ mlx5_dev_close in mlx5.c (drivers\net\mlx5) : * rte_eth_dev_release_port(). mac_addrs is part of dev_private so mlx5_dev_spawn in mlx5.c (drivers\net\mlx5) : /* mac_addrs must not be freed alone because part of dev_private */ rte_pmd_af_packet_remove in rte_eth_af_packet.c (drivers\net\af_packet) : /* mac_addrs must not be freed alone because part of dev_private */ eth_kni_remove in rte_eth_kni.c (drivers\net\kni) : /* mac_addrs must not be freed alone because part of dev_private */ rte_pmd_null_remove in rte_eth_null.c (drivers\net\null) : /* mac_addrs must not be freed alone because part of dev_private */ rte_pmd_ring_remove in rte_eth_ring.c (drivers\net\ring) : /* mac_addrs must not be freed alone because part of dev_private */ eth_dev_tap_create in rte_eth_tap.c (drivers\net\tap) : /* mac_addrs must not be freed alone because part of dev_private */ rte_pmd_tap_remove in rte_eth_tap.c (drivers\net\tap) : /* mac_addrs must not be freed alone because part of dev_private */ | | Linfan Hu | | zhongdahulin...@163.com | 签名由网易邮箱大师定制