On 4/22/17 4:00 PM, Martin KaFai Lau wrote: > On Sat, Apr 22, 2017 at 09:40:37AM -0700, David Ahern wrote: > [...] >> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c >> index 08f9e8ea7a81..97e86158bbcb 100644 >> --- a/net/ipv6/addrconf.c >> +++ b/net/ipv6/addrconf.c >> @@ -3303,14 +3303,24 @@ static void addrconf_gre_config(struct net_device >> *dev) >> static int fixup_permanent_addr(struct inet6_dev *idev, >> struct inet6_ifaddr *ifp) >> { >> - if (!ifp->rt) { >> - struct rt6_info *rt; >> + /* rt6i_ref == 0 means the host route was removed from the >> + * FIB, for example, if 'lo' device is taken down. In that >> + * case regenerate the host route. >> + */ >> + if (!ifp->rt || !atomic_read(&ifp->rt->rt6i_ref)) { >> + struct rt6_info *rt, *prev; >> >> rt = addrconf_dst_alloc(idev, &ifp->addr, false); > The rt regernation makes sense. > >> if (unlikely(IS_ERR(rt))) >> return PTR_ERR(rt); >> >> + spin_lock(&ifp->lock); >> + prev = ifp->rt; >> ifp->rt = rt; > I am still missing something on the new spin_lock: > 1) Is there an existing race in the existing > ifp->rt modification ('ipf->rt = rt') which is > not related to this bug? > 2) If there is a race in ifp->rt, is the above if-checks > on ifp->rt racy and need protection also? F.e. 'ifp->rt->rt6i_ref' > since ifp->rt could be NULL or ifp->rt->rt6i_ref > may not be zero later if there is concurrent > modification on ifp->rt?
As I understand it: - rt6i_ref is modified by the fib code (adding and removing to tree) and always under RTNL. - ifp->rt is only *set* under RTNL, but is accessed without (dad via workqueue and sysctl). The code path to fixup_permanent_addr is under RTNL, so the if check on ifp->rt and rt6i_ref is ok -- neither can be changed since RTNL is held. Since ifp->rt can be accessed outside of RTNL, the spinlock is needed to change its value. Arguably only 'ifp->rt = rt;' needs the spinlock. Let me know if I am missing something. There are many twists and turns with the ipv6 code. > >> + spin_unlock(&ifp->lock); >> + >> + if (prev) >> + ip6_rt_put(prev); > Nit. ip6_rt_put() takes NULL. ok.