https://bugs.dpdk.org/show_bug.cgi?id=816
Bug ID: 816 Summary: KNI deadlocks while processing mac address set request with linux kernel version >= v5.12 Product: DPDK Version: 21.08 Hardware: x86 OS: Linux Status: UNCONFIRMED Severity: critical Priority: Normal Component: ethdev Assignee: dev@dpdk.org Reporter: sahithi.sin...@oracle.com Target Milestone: --- Starting from linux kernel version 5.12, a new global semaphore(dev_addr_sem) was introduced in dev_set_mac_address_user() function that should be acquired and released along with rtnl_lock when a mac address set request was received from userspace. This introduced following locking sequence in linux kernel , 1. As part of dev_ioctl(), take rtnl_lock first 2. Then call down_write(&dev_addr_sem) in dev_set_mac_address_user() 3. Finally call kni_net_set_mac which calls kni_net_process_request. 4. In kni_net_process_request we will release only rtnl_lock but not dev_addr_sem before we enqueue the request to req_q(i.e to userspace dpdk process) 5. After receiving a response or timeout , we will again try to hold rtnl_lock Above sequence in KNI is resulting in deadlock as we are just releasing rtnl_lock without releasing semaphore while some other device could be waiting for dev_addr_sem while holding rtnl_lock. For example, if user issues two mac address set requests immediately on two kni interfaces intf1, intf2. Then 1.intf1 takes rtnl_lock 2.intf1 takes dev_addr_sem 3.intf2 waits for rtnl_lock 4.intf1 in KNI , releases rtnl_lock 5.intf2 takes rtnl_lock and waits for dev_addr_sem held by intf1 6.intf1 at the end of kni request handling code, will try to lock rtnl_lock held by intf2. But intf2 will not release rtnl_lock as it was waiting for dev_addr_sem held by intf1. So at the end, intf1 will be holding dev_addr_sem and will be waiting for rtnl_lock. intf2 will be holding rtnl_lock and will be waiting for dev_addr_sem resulting in the KERNEL deadlock. This issue started due to changes from commitid:631217c761353aa5e4e548a20e570245ecbc8eda (kni: fix kernel deadlock with bifurcated device) and with linux kernel version >= v5.12 -- You are receiving this mail because: You are the assignee for the bug.