From: Daniel Borkmann <dan...@iogearbox.net> Date: Fri, 7 Aug 2015 00:26:41 +0200
> Linus reports the following deadlock on rtnl_mutex; triggered only > once so far (extract): ... > It seems so far plausible that the recursive call into rtnetlink_rcv() > looks suspicious. One way, where this could trigger is that the senders > NETLINK_CB(skb).portid was wrongly 0 (which is rtnetlink socket), so > the rtnl_getlink() request's answer would be sent to the kernel instead > to the actual user process, thus grabbing rtnl_mutex() twice. > > One theory would be that netlink_autobind() triggered via netlink_sendmsg() > internally overwrites the -EBUSY error to 0, but where it is wrongly > originating from __netlink_insert() instead. That would reset the > socket's portid to 0, which is then filled into NETLINK_CB(skb).portid > later on. As commit d470e3b483dc ("[NETLINK]: Fix two socket hashing bugs.") > also puts it, -EBUSY should not be propagated from netlink_insert(). > > It looks like it's very unlikely to reproduce. We need to trigger the > rhashtable_insert_rehash() handler under a situation where rehashing > currently occurs (one /rare/ way would be to hit ht->elasticity limits > while not filled enough to expand the hashtable, but that would rather > require a specifically crafted bind() sequence with knowledge about > destination slots, seems unlikely). It probably makes sense to guard > __netlink_insert() in any case and remap that error. It was suggested > that EOVERFLOW might be better than an already overloaded ENOMEM. > > Reference: http://thread.gmane.org/gmane.linux.network/372676 > Reported-by: Linus Torvalds <torva...@linux-foundation.org> > Signed-off-by: Daniel Borkmann <dan...@iogearbox.net> Applied and queued up for -stable, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html