בתאריך יום ב׳, 4 באוק׳ 2021, 20:00, מאת Eric Christian <ercli...@gmail.com >:
> I am not sure that only we can recreate the KNI request overwrite. We may > be the only ones with a current use case that exposes the vulnerability. > It is possible for any KNI operation to encounter this issue with the new > async mechanism. As long as the call to kni_net_process_request() is a > separate thread from rte_kni_handle_request() this has the potential to > occur with the use of async requests. > > All you need is one async KNI request followed closely by a second KNI > request before the rte_kni_handle_request() has had a chance to process the > first request. > > The kernel dev driver simply returns the error value back to the caller if > it is less than zero. > > > Eric > What I did in order to attempt to recreate this issue was: Create a qemu VM with four cores. Run the KNI sample application. Then: 1. Run ifconfig vEth0_0 down followed immediately by ifconfig vEth0_0 up 2. Same as above but in a sample userspace C application which calls ioctl twice in a row to achieve the same. Both did not recreate the problem. The only way to recreate it IMHO is to either: run the ioctl from two parallel threads or perhaps assign RT fifo priority to the application calling ioctl in a row. Elad > > On Mon, Oct 4, 2021 at 12:19 PM Elad Nachman <ela...@gmail.com> wrote: > >> >> >> On Mon, Oct 4, 2021 at 7:05 PM Ferruh Yigit <ferruh.yi...@intel.com> >> wrote: >> >>> On 10/4/2021 3:58 PM, Elad Nachman wrote: >>> > בתאריך יום ב׳, 4 באוק׳ 2021, 17:51, מאת Ferruh Yigit < >>> > ferruh.yi...@intel.com>: >>> > >>> >> On 10/4/2021 3:25 PM, Elad Nachman wrote: >>> >> >>> >> Can you please try to not top post, it will make impossible to follow >>> this >>> >> discussion later from the mail archives. >>> >> >>> >>> 1. Userspace will get an error >>> >> >>> >> So there is nothing special with returning '-EAGAIN', user will only >>> >> observe an >>> >> error. >>> >> Wasn't initial intention to use '-EAGAIN' to try request again? >>> >> >>> > To signal user-space to retry the operation. >>> > >>> >>> Not sure if it will reach to the end user. If user is calling "ifconfig >>> <iface> >>> down", it will just fail right, it won't recognize the error type. >>> >>> Unless this is common usage by the Linux network drivers, having this >>> usage in >>> KNI won't help much. I am for handling this in the kernel side if we can. >>> >>> >> If user calls ifconfig <iface> down it will not happen. It requires some >> multi-core race condition only Eric can recreate. >> >> >>> >> >>> >>> 2. Waiting with rtnl locked causes a deadlock; waiting with rtnl >>> unlocked >>> >>> for interface down command causes a crash because of a race >>> condition in >>> >>> the device delete/unregister list in the kernel. >>> >>> >>> >> >>> >> Why waiting with rthnl lock causes a deadlock? As said below we are >>> already >>> >> doing it, why it is different with retry logic? >>> >> >>> > Because it can be interface down request. >>> > >>> >>> (sure you like short answers) >>> >>> Please help me to see why "interface down" is special. Isn't it point of >>> your >>> patch to wait the request execution in the userspace even it is an async >>> request? >>> >>> And yet again, number of retry can be limited. >>> >>> >> No, it is not. Please look again: >> https://patches.dpdk.org/project/dpdk/patch/20210924105409.21711-1-ela...@gmail.com/ >> >> >> >>> >>> > >>> >> I agree to not wait with rtnl unlocked. >>> >> >>> >>> FYI, >>> >>> >>> >>> Elad. >>> >>> >>> >>> בתאריך יום ב׳, 4 באוק׳ 2021, 17:13, מאת Ferruh Yigit < >>> >>> ferruh.yi...@intel.com>: >>> >>> >>> >>>> On 10/4/2021 2:09 PM, Elad Nachman wrote: >>> >>>>> Hi, >>> >>>>> >>> >>>>> EAGAIN is propogated back to the kernel and to the caller. >>> >>>>> >>> >>>> >>> >>>> So will the user get an error, or it will be handled by the kernel >>> and >>> >>>> retried? >>> >>>> >>> >>>>> We cannot retry from the kni kernel module since we hold the rtnl >>> lock. >>> >>>>> >>> >>>> >>> >>>> Why not? We are already waiting until a command time out, like >>> >>>> 'kni_net_open()' >>> >>>> can retry if 'kni_net_process_request()' returns '-EAGAIN'. And we >>> can >>> >>>> limit the >>> >>>> number of retry for safety. >>> >>>> >>> >>>>> FYI, >>> >>>>> >>> >>>>> Elad >>> >>>>> >>> >>>>> בתאריך יום ב׳, 4 באוק׳ 2021, 16:05, מאת Ferruh Yigit < >>> >>>>> ferruh.yi...@intel.com>: >>> >>>>> >>> >>>>>> On 9/24/2021 11:54 AM, Elad Nachman wrote: >>> >>>>>>> Fix lack of multiple KNI requests handling support by >>> introducing a >>> >>>>>>> request in progress flag which will fail additional requests with >>> >>>>>>> EAGAIN return code if the original request has not been processed >>> >>>>>>> by user-space. >>> >>>>>>> >>> >>>>>>> Bugzilla ID: 809 >>> >>>>>> >>> >>>>>> Hi Eric, >>> >>>>>> >>> >>>>>> Can you please test this patch, if it solves the issue you >>> reported? >>> >>>>>> >>> >>>>>>> >>> >>>>>>> Signed-off-by: Elad Nachman <ela...@gmail.com> >>> >>>>>>> --- >>> >>>>>>> kernel/linux/kni/kni_net.c | 9 +++++++++ >>> >>>>>>> lib/kni/rte_kni.c | 2 ++ >>> >>>>>>> lib/kni/rte_kni_common.h | 1 + >>> >>>>>>> 3 files changed, 12 insertions(+) >>> >>>>>>> >>> >>>>>> >>> >>>>>> <...> >>> >>>>>> >>> >>>>>>> @@ -123,7 +124,15 @@ kni_net_process_request(struct net_device >>> *dev, >>> >>>>>> struct rte_kni_request *req) >>> >>>>>>> >>> >>>>>>> mutex_lock(&kni->sync_lock); >>> >>>>>>> >>> >>>>>>> + /* Check that existing request has been processed: */ >>> >>>>>>> + cur_req = (struct rte_kni_request *)kni->sync_kva; >>> >>>>>>> + if (cur_req->req_in_progress) { >>> >>>>>>> + ret = -EAGAIN; >>> >>>>>> >>> >>>>>> Overall logic in the KNI looks good to me, this helps to >>> serialize the >>> >>>>>> requests >>> >>>>>> even for async ones. >>> >>>>>> >>> >>>>>> But can you please clarify how it behaves in the kernel side with >>> >>>> '-EAGAIN' >>> >>>>>> return type? Will linux call the ndo again, or will it just fail. >>> >>>>>> >>> >>>>>> If it just fails should we handle the re-try on '-EAGAIN' within >>> the >>> >> kni >>> >>>>>> module? >>> >>>>>> >>> >>>>>> >>> >>>> >>> >>>> >>> >> >>> >> Elad. >>> >>>