1. Userspace will get an error 2. Waiting with rtnl locked causes a deadlock; waiting with rtnl unlocked for interface down command causes a crash because of a race condition in the device delete/unregister list in the kernel.
FYI, Elad. בתאריך יום ב׳, 4 באוק׳ 2021, 17:13, מאת Ferruh Yigit < ferruh.yi...@intel.com>: > On 10/4/2021 2:09 PM, Elad Nachman wrote: > > Hi, > > > > EAGAIN is propogated back to the kernel and to the caller. > > > > So will the user get an error, or it will be handled by the kernel and > retried? > > > We cannot retry from the kni kernel module since we hold the rtnl lock. > > > > Why not? We are already waiting until a command time out, like > 'kni_net_open()' > can retry if 'kni_net_process_request()' returns '-EAGAIN'. And we can > limit the > number of retry for safety. > > > FYI, > > > > Elad > > > > בתאריך יום ב׳, 4 באוק׳ 2021, 16:05, מאת Ferruh Yigit < > > ferruh.yi...@intel.com>: > > > >> On 9/24/2021 11:54 AM, Elad Nachman wrote: > >>> Fix lack of multiple KNI requests handling support by introducing a > >>> request in progress flag which will fail additional requests with > >>> EAGAIN return code if the original request has not been processed > >>> by user-space. > >>> > >>> Bugzilla ID: 809 > >> > >> Hi Eric, > >> > >> Can you please test this patch, if it solves the issue you reported? > >> > >>> > >>> Signed-off-by: Elad Nachman <ela...@gmail.com> > >>> --- > >>> kernel/linux/kni/kni_net.c | 9 +++++++++ > >>> lib/kni/rte_kni.c | 2 ++ > >>> lib/kni/rte_kni_common.h | 1 + > >>> 3 files changed, 12 insertions(+) > >>> > >> > >> <...> > >> > >>> @@ -123,7 +124,15 @@ kni_net_process_request(struct net_device *dev, > >> struct rte_kni_request *req) > >>> > >>> mutex_lock(&kni->sync_lock); > >>> > >>> + /* Check that existing request has been processed: */ > >>> + cur_req = (struct rte_kni_request *)kni->sync_kva; > >>> + if (cur_req->req_in_progress) { > >>> + ret = -EAGAIN; > >> > >> Overall logic in the KNI looks good to me, this helps to serialize the > >> requests > >> even for async ones. > >> > >> But can you please clarify how it behaves in the kernel side with > '-EAGAIN' > >> return type? Will linux call the ndo again, or will it just fail. > >> > >> If it just fails should we handle the re-try on '-EAGAIN' within the kni > >> module? > >> > >> > >