Hi all,

struct rte_eth_dev_data has a member named dev_private and another named 
mac_addrs, as shown below:


struct rte_eth_dev_data {
...
void *dev_private;
/**< PMD-specific private data.
*   @see rte_eth_dev_release_port()
*/


struct ether_addr *mac_addrs;
/**< Device Ethernet link address.
*   @see rte_eth_dev_release_port()
*/
...
};


Some drivers like mlx5 implements mac_addrs as part of dev_private:


static struct rte_eth_dev *
mlx5_dev_spawn(struct rte_device *dpdk_dev,
      struct ibv_device *ibv_dev,
      struct mlx5_dev_config config,
      const struct mlx5_switch_info *switch_info)
{
...
    eth_dev->data->mac_addrs = priv->mac;
...
}


I don't think it's a good code habit, cause this may have potential issues 
while freeing dev_private and/or mac_addrs., even though they commented: 


'/* mac_addrs must not be freed because part of dev_private */"
eth_dev->data->mac_addrs = NULL;


It's all good when things all done in primary process. But if this invoked in 
secondary process, it'll cause primary crash.


In my test environment I have two Mellanox ports bonded with 802.3ad mode. And 
my dpdk-app enabled with rte_pdump. After I started my dpdk-app, I launch 
dpdk-dump to capture packets. Then my dpdk-app crash


Here is the backtrace: 


(gdb) bt
#0  0x00005555556a477b in rte_eth_macaddr_get ()
#1  0x0000555555699cbc in bond_mode_8023ad_periodic_cb ()
#2  0x00005555556d836c in eal_alarm_callback ()
#3  0x00005555556d68a2 in eal_intr_thread_main ()
#4  0x00007ffff67744a4 in start_thread (arg=0x7ffff5d3a700) at 
pthread_create.c:456
#5  0x00007ffff62abd0f in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:97


Here is dpdk-pdump's backtrace:


(gdb) r
Starting program: /home/hulinfan/dpdk-stable-18.11.2/build/app/dpdk-pdump -- 
--pdump 'port=0,queue=*,rx-dev=eth7,tx-dev=eth7'
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
EAL: Detected 40 lcore(s)
EAL: Detected 2 NUMA nodes
pathname for rte_mem_config: /var/run/dpdk/rte/config
[New Thread 0x7ffff61d3700 (LWP 5508)]
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket_5507_a3efe2a87b368c
[New Thread 0x7ffff59d2700 (LWP 5509)]
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: PCI device 0000:01:00.0 on NUMA socket 0
EAL:   probe driver: 8086:10fb net_ixgbe
EAL: PCI device 0000:01:00.1 on NUMA socket 0
EAL:   probe driver: 8086:10fb net_ixgbe
EAL: PCI device 0000:04:00.0 on NUMA socket 0
EAL:   probe driver: 15b3:1015 net_mlx5
net_mlx5: port 0 UAR address 0x7ffef4538000 size 4294967296 occupied, please 
adjust MLX5_UAR_OFFSET or try EAL parameter --base-virtaddr


Thread 1 "dpdk-pdump" hit Breakpoint 4, mlx5_dev_spawn 
(dpdk_dev=dpdk_dev@entry=0x5555563d84e0, ibv_dev=<optimized out>, config=..., 
switch_info=switch_info@entry=0x7fffffffe5a8)
    at /home/hulinfan/dpdk-stable-18.11.2/drivers/net/mlx5/mlx5.c:1279
1279eth_dev->data->mac_addrs = NULL;
(gdb) l
1274}


Let me try to explain:
1) Secondary (like dpdk-pdump) init eal environment with rte_eal_init();
2) Probe all the buses and devices/drivers with rte_bus_probe();
3) the pci probe  handler mlx5_pci_probe() is called;
4) mlx5_dev_spawn() is called
5) mlx5_uar_init_secondary() is called and failed
6) go out and clean resources, setting eth_dev->data->mac_addrs = NULL;
7) Primary's alarm callback bond_mode_8023ad_periodic_cb() try to get mac addrs 
with rte_eth_macaddr_get (), then crashed.


Please comfirm whether  the following drivers have the same bug:


---- part of dev_private Matches (12 in 9 files) ----
fs_eth_dev_create in failsafe.c (drivers\net\failsafe) : /* mac_addrs must not 
be freed alone because part of dev_private */
fs_rte_eth_free in failsafe.c (drivers\net\failsafe) : /* mac_addrs must not be 
freed alone because part of dev_private */
eth_dev_vmbus_release in hn_ethdev.c (drivers\net\netvsc) : /* mac_addrs must 
not be freed alone because part of dev_private */
mlx4_pci_probe in mlx4.c (drivers\net\mlx4) : /* mac_addrs must not be freed 
because part of dev_private */
mlx5_dev_close in mlx5.c (drivers\net\mlx5) : * rte_eth_dev_release_port(). 
mac_addrs is part of dev_private so
mlx5_dev_spawn in mlx5.c (drivers\net\mlx5) : /* mac_addrs must not be freed 
alone because part of dev_private */
rte_pmd_af_packet_remove in rte_eth_af_packet.c (drivers\net\af_packet) : /* 
mac_addrs must not be freed alone because part of dev_private */
eth_kni_remove in rte_eth_kni.c (drivers\net\kni) : /* mac_addrs must not be 
freed alone because part of dev_private */
rte_pmd_null_remove in rte_eth_null.c (drivers\net\null) : /* mac_addrs must 
not be freed alone because part of dev_private */
rte_pmd_ring_remove in rte_eth_ring.c (drivers\net\ring) : /* mac_addrs must 
not be freed alone because part of dev_private */
eth_dev_tap_create in rte_eth_tap.c (drivers\net\tap) : /* mac_addrs must not 
be freed alone because part of dev_private */
rte_pmd_tap_remove in rte_eth_tap.c (drivers\net\tap) : /* mac_addrs must not 
be freed alone because part of dev_private */




| |
Linfan Hu
|
|
zhongdahulin...@163.com
|
签名由网易邮箱大师定制

Reply via email to