Stephen, No, I don't have a better proposal, but I think it is not correct to change the behavior of KNI (making link down without a real response). Even though we know that communicating with userspace under rtnl_lock is a bad idea, it works as it is for many years already.
Elad, I agree with you that KNI should be removed from the main tree if it is not possible to fix this __dev_close_many issue. There were discussions about this multiple times already, but no one is working on this AFAIK. Last time the discussion was a month ago: https://www.mail-archive.com/dev@dpdk.org/msg196033.html Igor On Fri, Feb 26, 2021 at 8:43 PM Elad Nachman <ela...@gmail.com> wrote: > The way the kernel handles its locks and lists for the dev close many > path, there is no way you can go around this with rtnl unlocked : > " > > There is a race condition in __dev_close_many() processing the > close_list while the application terminates. > It looks like if two vEth devices are terminating, > and one releases the rtnl lock, the other takes it, > updating the close_list in an unstable state, > causing the close_list to become a circular linked list, > hence list_for_each_entry() will endlessly loop inside > __dev_close_many() . > > " > And I don't expect David Miller will bend the kernel networking for DPDK > or KNI. > > But - Stephen - if you can personally convince David to accept a > kernel patch which will separate the close_list locking mechanism to a > separate (RCU?) lock, then I can introduce first a patch to the kernel > which will add a lock for the close_list, this way rtnl can be > unlocked for the if down case. > > After that kernel patch, your original patch + relocation of the sync > mutex locking will do the job . > > Otherwise, rtnl has to be kept locked all of the way for the if down > case in order to prevent corruption causing a circular linked list out > of the close_list, causing a hang in the kernel. > > Currently, the rtnl lock is the only thing keeping the close_list from > corruption. > > If you doubt rtnl cannot be unlocked for dev close path, you can > consult David for his opinion, as I think it is critical to understand > what the kernel can or cannot do, or expects to be done before we can > unlock its locks as we wish inside rte_kni.ko . > > Otherwise, if we are still in disagreement on how to patch this set of > problems, I think the responsible way around it is to completely > remove kni from the main dpdk tree and move it to dpdk-kmods > repository. > > I know BSD style open-source does not carry legal responsibility from > the developers, but I think when a bunch of developers know a piece of > code is highly buggy, they should not leave it for countless new users > to bounce their head desperately against, if they cannot agree on a > correct way to solve the bunch of problems, of which I think we all > agree exist (we just do not agree on the proper solution or patch)... > > That's my two cents, > > Elad. > > On Fri, Feb 26, 2021 at 5:49 PM Stephen Hemminger > <step...@networkplumber.org> wrote: > > > > On Fri, 26 Feb 2021 00:01:01 +0300 > > Igor Ryzhov <iryz...@nfware.com> wrote: > > > > > Hi Elad, > > > > > > Thanks for the patch, but this is still NACK from me. > > > > > > The only real advantage of KNI over other exceptional-path techniques > > > like virtio-user is the ability to configure DPDK-managed interfaces > > > directly > > > from the kernel using well-known utils like iproute2. A very important > part > > > of this is getting responses from the DPDK app and knowing the actual > > > result of command execution. > > > If you're making async requests to the application and you don't know > > > the result, then what's the point of using KNI at all? > > > > > > Igor > > > > Do you have a better proposal that keeps the request result but does not > > call userspace with lock held. > > > > PS: I also have strong dislike of KNI, as designed it would have been > rejected > > by Linux kernel developers. A better solution would be userspace > version of > > something like devlink devices. But doing control operations by proxy is > > a locking nightmare. >