On 2/23/2021 12:05 PM, Elad Nachman wrote:
This version 2 of the patch leverages on Stephen Hemminger's 64106
patch from Dec 2019,
and fixes the issues reported by Ferruh and Igor:

A. KNI sync lock is being locked while rtnl is held.
If two threads are calling kni_net_process_request() ,
then the first one will take the sync lock, release rtnl lock then sleep.
The second thread will try to lock sync lock while holding rtnl.
The first thread will wake, and try to lock rtnl, resulting in a deadlock.
The remedy is to release rtnl before locking the KNI sync lock.
Since in between nothing is accessing Linux network-wise,
no rtnl locking is needed.

B. There is a race condition in __dev_close_many() processing the
close_list while the application terminates.
It looks like if two vEth devices are terminating,
and one releases the rtnl lock, the other takes it,
updating the close_list in an unstable state,
causing the close_list to become a circular linked list,
hence list_for_each_entry() will endlessly loop inside
__dev_close_many() .
Since the description for the original patch indicate the
original motivation was bringing the device up,
I have changed kni_net_process_request() to hold the rtnl mutex
in case of bringing the device down since this is the path called
from __dev_close_many() , causing the corruption of the close_list.

Depends-on: patch-64106 ("kni: fix kernel deadlock when using mlx devices")
>

Can you please make new version of the patches on top of latest git head, not exiting patches, we don't support incremental updates.


Signed-off-by: Elad Nachman <ela...@gmail.com>
---
V2:
* rebuild the patch as increment from patch 64106
* fix comment and blank lines

---
  kernel/linux/kni/kni_net.c | 25 +++++++++++++++++--------
  1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/kernel/linux/kni/kni_net.c b/kernel/linux/kni/kni_net.c
index f0b6e9a8d..b41360220 100644
--- a/kernel/linux/kni/kni_net.c
+++ b/kernel/linux/kni/kni_net.c
@@ -110,9 +110,22 @@ kni_net_process_request(struct net_device *dev, struct 
rte_kni_request *req)
        void *resp_va;
        uint32_t num;
        int ret_val;
+       int req_is_dev_stop = 0;
+

One more thing, can you please add comment to code why "stop" request is special? You have it in the commit log, but a short description in code also cna be helpful.

Reply via email to