Kirill Tkhai <ktk...@virtuozzo.com> writes: > peernet2id_alloc() is racy without rtnl_lock() as atomic_read(&peer->count) > under net->nsid_lock does not guarantee, peer is alive: > > rcu_read_lock() > peernet2id_alloc() .. > spin_lock_bh(&net->nsid_lock) .. > atomic_read(&peer->count) == 1 .. > .. put_net() > .. cleanup_net() > .. for_each_net(tmp) > .. > spin_lock_bh(&tmp->nsid_lock) > .. __peernet2id(tmp, net) == > -1 > .. .. > .. .. > __peernet2id_alloc(alloc == true) .. > .. .. > rcu_read_unlock() .. > .. synchronize_rcu() > .. kmem_cache_free(net) > > After the above situation, net::netns_id contains id pointing to freed memory, > and any other dereferencing by the id will operate with this freed memory. > > Currently, peernet2id_alloc() is used under rtnl_lock() everywhere except > ovs_vport_cmd_fill_info(), and this race can't occur. But peernet2id_alloc() > is generic interface, and better we fix it before someone really starts > use it in wrong context.
So it comes down to this piece of code from ovs and just let me say ick. if (!net_eq(net, dev_net(vport->dev))) { int id = peernet2id_alloc(net, dev_net(vport->dev)); if (nla_put_s32(skb, OVS_VPORT_ATTR_NETNSID, id)) goto nla_put_failure; } Without the rtnl lock dev_net can cange between the test and the call of peernet2id_alloc. At first glance it looks like the bug is that we are running a control path of the networking stack without the rtnl lock. So it may be that ASSERT_RTNL() is the better fix. Given that it would be nice to reduce the scope of the rtnl lock this might not be a bad direction. Let me see. Is rtnl_notify safe without the rtnl lock? > > Signed-off-by: Kirill Tkhai <ktk...@virtuozzo.com> > --- > net/core/net_namespace.c | 23 +++++++++++++++++++---- > 1 file changed, 19 insertions(+), 4 deletions(-) > > diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c > index 60a71be75aea..6a4eab438221 100644 > --- a/net/core/net_namespace.c > +++ b/net/core/net_namespace.c > @@ -221,17 +221,32 @@ static void rtnl_net_notifyid(struct net *net, int cmd, > int id); > */ > int peernet2id_alloc(struct net *net, struct net *peer) > { > - bool alloc; > + bool alloc = false, alive = false; > int id; ^^^ Perhaps we want "ASSERT_RTNL();" here? > > - if (atomic_read(&net->count) == 0) > - return NETNSA_NSID_NOT_ASSIGNED; Moving this hunk is of no benefit. The code must be called with a valid reference to net. Which means net->count is a fancy way of testing to see if the code is in cleanup_net. In all other cases net->count should be non-zero and it should remain that way because of our caller must keep a reference. > spin_lock_bh(&net->nsid_lock); > - alloc = atomic_read(&peer->count) == 0 ? false : true; > + /* Spinlock guarantees we never hash a peer to net->netns_ids > + * after idr_destroy(&net->netns_ids) occurs in cleanup_net(). > + */ > + if (atomic_read(&net->count) == 0) { > + id = NETNSA_NSID_NOT_ASSIGNED; > + goto unlock; > + } > + /* > + * When peer is obtained from RCU lists, we may race with > + * its cleanup. Check whether it's alive, and this guarantees > + * we never hash a peer back to net->netns_ids, after it has > + * just been idr_remove()'d from there in cleanup_net(). > + */ > + if (maybe_get_net(peer)) > + alive = alloc = true; Yes this does seem reasonable. The more obvious looking code which would return NETNSA_NSID_NOT_ASSIGNED if the peer has a count of 0, is silly as it makes would make it appear that a peer is momentary outside of a network namespace when the peer is in fact moving from one network namespace to another. > id = __peernet2id_alloc(net, peer, &alloc); > +unlock: > spin_unlock_bh(&net->nsid_lock); > if (alloc && id >= 0) > rtnl_net_notifyid(net, RTM_NEWNSID, id); ^^^^^^ Is this safe without the rtnl lock? > + if (alive) > + put_net(peer); > return id; > } > EXPORT_SYMBOL_GPL(peernet2id_alloc); Eric