On 22 February 2016 at 01:38, Ian Kumlien <ian.kuml...@gmail.com> wrote: > Hi, > > When i tried to upgrade my, soon to be, firewall to 4.5-rc5 to do some > testing - it deadlocked almost instantly.
After bisect, the offending patch seems to be: b16c29191dc89bd877af99a7b04ce4866728a3e0 It looks like some basic sanity checking went missing... The original patch does: diff --git a/net/netfilter/nfnetlink_cttimeout.c b/net/netfilter/nfnetlink_cttimeout.c index 5d010f2..94837d2 100644 --- a/net/netfilter/nfnetlink_cttimeout.c +++ b/net/netfilter/nfnetlink_cttimeout.c @@ -307,12 +307,12 @@ static void ctnl_untimeout(struct net *net, struct ctnl_timeout *timeout) local_bh_disable(); for (i = 0; i < net->ct.htable_size; i++) { - spin_lock(&nf_conntrack_locks[i % CONNTRACK_LOCKS]); + nf_conntrack_lock(&nf_conntrack_locks[i % CONNTRACK_LOCKS]); if (i < net->ct.htable_size) { hlist_nulls_for_each_entry(h, nn, &net->ct.hash[i], hnnode) untimeout(h, timeout); } - spin_unlock(&nf_conntrack_locks[i % CONNTRACK_LOCKS]); + nf_conntrack_lock(&nf_conntrack_locks[i % CONNTRACK_LOCKS]); } local_bh_enable(); } --- Which looks like a mistake - the fix should be: diff --git a/net/netfilter/nfnetlink_cttimeout.c b/net/netfilter/nfnetlink_cttimeout.c index 94837d2..2671b9d 100644 --- a/net/netfilter/nfnetlink_cttimeout.c +++ b/net/netfilter/nfnetlink_cttimeout.c @@ -312,7 +312,7 @@ static void ctnl_untimeout(struct net *net, struct ctnl_timeout *timeout) hlist_nulls_for_each_entry(h, nn, &net->ct.hash[i], hnnode) untimeout(h, timeout); } - nf_conntrack_lock(&nf_conntrack_locks[i % CONNTRACK_LOCKS]); + spin_unlock(&nf_conntrack_locks[i % CONNTRACK_LOCKS]); } local_bh_enable(); } --- And it fixes my issue! ;) > In the photo, i started writing "root" and it keeps repeating it, like > it's in a while loop. > > https://goo.gl/photos/yGhNSogJjeb2VJyu5 > > Trying to get better information - as in any - i enabled quite a few > debugging options that could have any bearing on it and ended up with: > https://goo.gl/photos/NnQER2WXXJ5ZWPR67 > > The interesting part is that in this case the machine was booted in to > single user mode and did not crash. > > It seems like it gets in to trouble when the bridges and the network > interfaces are enabled, as in just about a second or two after boot. [--8<--]
From caff3fec1641ba3e207ff705b68eba62dec3bef9 Mon Sep 17 00:00:00 2001 From: Ian Kumlien <ian.kuml...@gmail.com> Date: Wed, 24 Feb 2016 23:40:57 +0100 Subject: [PATCH] netfilter: nf_conntrack: lock error A lock error was introduced during the lock cleanup lets undo that, =) Signed-off-by: Ian Kumlien <ian.kuml...@gmail.com> --- net/netfilter/nfnetlink_cttimeout.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/netfilter/nfnetlink_cttimeout.c b/net/netfilter/nfnetlink_cttimeout.c index 94837d2..2671b9d 100644 --- a/net/netfilter/nfnetlink_cttimeout.c +++ b/net/netfilter/nfnetlink_cttimeout.c @@ -312,7 +312,7 @@ static void ctnl_untimeout(struct net *net, struct ctnl_timeout *timeout) hlist_nulls_for_each_entry(h, nn, &net->ct.hash[i], hnnode) untimeout(h, timeout); } - nf_conntrack_lock(&nf_conntrack_locks[i % CONNTRACK_LOCKS]); + spin_unlock(&nf_conntrack_locks[i % CONNTRACK_LOCKS]); } local_bh_enable(); } -- 2.7.2