On Sun, Nov 27, 2016 at 8:23 AM, Eric Dumazet <[email protected]> wrote: > On Sat, 2016-11-26 at 22:28 -0800, Cong Wang wrote: >> On Sat, Nov 26, 2016 at 6:26 PM, Eric Dumazet <[email protected]> wrote: >> > >> > Are you telling me inet_release() is called when we close() the first >> > file descriptor ? >> > >> > fd1 = socket() >> > fd2 = dup(fd1); >> > close(fd2) -> release() ??? >> >> Sorry, I didn't express myself clearly, I meant your change, >> if exclude the SOCK_RCU_FREE part, basically reverts this commit: >> >> commit 3f660d66dfbc13ea4b61d3865851b348444c24b4 >> Author: Herbert Xu <[email protected]> >> Date: Thu May 3 03:17:14 2007 -0700 >> >> [NETLINK]: Kill CB only when socket is unused >> >> IOW, ->release() is called when the last sock fd ref is gone, but >> ->destructor() >> is called with the last sock ref is gone. They are very different. > > Hmm... > > >> I am confused, what Subash reported is a kernel warning which can >> surely be fixed by removing genl lock (if it is correct, I need to double >> check), so why for net-next? > > Because Subash pointed to a buggy commit. > > We want to fix all issues bring by this commit, not only the immediate > problem about mutex. > > I have no idea if we can safely remove the mutex from genl_lock_done() :
I meant removing it only for the destructor case, we definitely can't remove it for the dump case. > > The genl_lock() is not only protecting the socket itself, it might > protect global data as well, or protect some kind of lock ordering among > multiple mutexes. > > Have you checked all genl users, down to linux-4.0 , point where commit > 21e4902aea80ef35a was added ? > I just took a deeper look, some user calls rhashtable_destroy() in ->done(), so even removing that genl lock is not enough, perhaps we should just move it to a work struct like what Daniel does for the tcf_proto, but that is ugly... I don't know if RCU provides any API to execute the callback in process context.
