Re: Crash due to mutex genl_lock called from RCU context

2016-11-29 Thread David Miller
From: Herbert Xu Date: Mon, 28 Nov 2016 19:22:12 +0800 > netlink: Call cb->done from a worker thread > > The cb->done interface expects to be called in process context. > This was broken by the netlink RCU conversion. This patch fixes > it by adding a worker struct to make the cb->done call whe

Re: Crash due to mutex genl_lock called from RCU context

2016-11-28 Thread Cong Wang
On Mon, Nov 28, 2016 at 3:22 AM, Herbert Xu wrote: > netlink: Call cb->done from a worker thread > > The cb->done interface expects to be called in process context. > This was broken by the netlink RCU conversion. This patch fixes > it by adding a worker struct to make the cb->done call where > n

Re: Crash due to mutex genl_lock called from RCU context

2016-11-28 Thread Herbert Xu
On Sun, Nov 27, 2016 at 10:53:21PM -0800, Cong Wang wrote: > > I just took a deeper look, some user calls rhashtable_destroy() in ->done(), > so even removing that genl lock is not enough, perhaps we should just > move it to a work struct like what Daniel does for the tcf_proto, but that is > ugly.

Re: Crash due to mutex genl_lock called from RCU context

2016-11-27 Thread Cong Wang
On Sun, Nov 27, 2016 at 8:23 AM, Eric Dumazet wrote: > On Sat, 2016-11-26 at 22:28 -0800, Cong Wang wrote: >> On Sat, Nov 26, 2016 at 6:26 PM, Eric Dumazet wrote: >> > >> > Are you telling me inet_release() is called when we close() the first >> > file descriptor ? >> > >> > fd1 = socket() >> > f

Re: Crash due to mutex genl_lock called from RCU context

2016-11-27 Thread Eric Dumazet
On Sat, 2016-11-26 at 22:28 -0800, Cong Wang wrote: > On Sat, Nov 26, 2016 at 6:26 PM, Eric Dumazet wrote: > > > > Are you telling me inet_release() is called when we close() the first > > file descriptor ? > > > > fd1 = socket() > > fd2 = dup(fd1); > > close(fd2) -> release() ??? > > Sorry, I di

Re: Crash due to mutex genl_lock called from RCU context

2016-11-26 Thread Cong Wang
On Sat, Nov 26, 2016 at 6:26 PM, Eric Dumazet wrote: > > Are you telling me inet_release() is called when we close() the first > file descriptor ? > > fd1 = socket() > fd2 = dup(fd1); > close(fd2) -> release() ??? Sorry, I didn't express myself clearly, I meant your change, if exclude the SOCK_RC

Re: Crash due to mutex genl_lock called from RCU context

2016-11-26 Thread Eric Dumazet
On Sat, 2016-11-26 at 18:08 -0800, Cong Wang wrote: > On Fri, Nov 25, 2016 at 8:54 PM, Eric Dumazet wrote: > > > > Oh well, this wont work, since sk->sk_destruct will be called from RCU > > callback. > > > > Grabbing the mutex should not be done from netlink_sock_destruct() but > > from netlink_re

Re: Crash due to mutex genl_lock called from RCU context

2016-11-26 Thread Cong Wang
On Fri, Nov 25, 2016 at 8:54 PM, Eric Dumazet wrote: > > Oh well, this wont work, since sk->sk_destruct will be called from RCU > callback. > > Grabbing the mutex should not be done from netlink_sock_destruct() but > from netlink_release() But you also change the behavior of cb.done(), currently

Re: Crash due to mutex genl_lock called from RCU context

2016-11-25 Thread subashab
Oh well, this wont work, since sk->sk_destruct will be called from RCU callback. Grabbing the mutex should not be done from netlink_sock_destruct() but from netlink_release() Maybe this patch would be better : diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index 62bea4591054..

Re: Crash due to mutex genl_lock called from RCU context

2016-11-25 Thread Eric Dumazet
On Fri, 2016-11-25 at 20:11 -0800, Eric Dumazet wrote: > On Fri, 2016-11-25 at 19:15 -0700, subas...@codeaurora.org wrote: > > We are seeing a crash due to gen_lock mutex being acquired in RCU > > context. > > Crash is seen on a 4.4 based kernel ARM64 device. This occurred in a > > regression rack

Re: Crash due to mutex genl_lock called from RCU context

2016-11-25 Thread Eric Dumazet
On Fri, 2016-11-25 at 19:15 -0700, subas...@codeaurora.org wrote: > We are seeing a crash due to gen_lock mutex being acquired in RCU > context. > Crash is seen on a 4.4 based kernel ARM64 device. This occurred in a > regression rack, so unfortunately I don't have steps for a reproducer. > > It l

Crash due to mutex genl_lock called from RCU context

2016-11-25 Thread subashab
We are seeing a crash due to gen_lock mutex being acquired in RCU context. Crash is seen on a 4.4 based kernel ARM64 device. This occurred in a regression rack, so unfortunately I don't have steps for a reproducer. It looks like freeing socket in RCU was brought in through commit 21e4902aea80ef3