On 9/30/19 8:01 PM, Eric Dumazet wrote: > > > On 9/30/19 6:37 PM, David Ahern wrote: >> From: David Ahern <dsah...@gmail.com> >> >> Rajendra reported a kernel panic when a link was taken down: >> >> [ 6870.263084] BUG: unable to handle kernel NULL pointer dereference at >> 00000000000000a8 >> [ 6870.271856] IP: [<ffffffff8efc5764>] __ipv6_ifa_notify+0x154/0x290 >> >> <snip> >> >> [ 6870.570501] Call Trace: >> [ 6870.573238] [<ffffffff8efc58c6>] ? ipv6_ifa_notify+0x26/0x40 >> [ 6870.579665] [<ffffffff8efc98ec>] ? addrconf_dad_completed+0x4c/0x2c0 >> [ 6870.586869] [<ffffffff8efe70c6>] ? ipv6_dev_mc_inc+0x196/0x260 >> [ 6870.593491] [<ffffffff8efc9c6a>] ? addrconf_dad_work+0x10a/0x430 >> [ 6870.600305] [<ffffffff8f01ade4>] ? __switch_to_asm+0x34/0x70 >> [ 6870.606732] [<ffffffff8ea93a7a>] ? process_one_work+0x18a/0x430 >> [ 6870.613449] [<ffffffff8ea93d6d>] ? worker_thread+0x4d/0x490 >> [ 6870.619778] [<ffffffff8ea93d20>] ? process_one_work+0x430/0x430 >> [ 6870.626495] [<ffffffff8ea99dd9>] ? kthread+0xd9/0xf0 >> [ 6870.632145] [<ffffffff8f01ade4>] ? __switch_to_asm+0x34/0x70 >> [ 6870.638573] [<ffffffff8ea99d00>] ? kthread_park+0x60/0x60 >> [ 6870.644707] [<ffffffff8f01ae77>] ? ret_from_fork+0x57/0x70 >> [ 6870.650936] Code: 31 c0 31 d2 41 b9 20 00 08 02 b9 09 00 00 0 >> >> addrconf_dad_work is kicked to be scheduled when a device is brought >> up. There is a race between addrcond_dad_work getting scheduled and >> taking the rtnl lock and a process taking the link down (under rtnl). >> The latter removes the host route from the inet6_addr as part of >> addrconf_ifdown which is run for NETDEV_DOWN. The former attempts >> to use the host route in ipv6_ifa_notify. If the down event removes >> the host route due to the race to the rtnl, then the BUG listed above >> occurs. >> >> This scenario does not occur when the ipv6 address is not kept >> (net.ipv6.conf.all.keep_addr_on_down = 0) as addrconf_ifdown sets the >> state of the ifp to DEAD. Handle when the addresses are kept by checking >> IF_READY which is reset by addrconf_ifdown. >> >> Fixes: f1705ec197e7 ("net: ipv6: Make address flushing on ifdown optional") >> Reported-by: Rajendra Dendukuri <rajendra.denduk...@broadcom.com> >> Signed-off-by: David Ahern <dsah...@gmail.com> >> --- >> net/ipv6/addrconf.c | 6 ++++++ >> 1 file changed, 6 insertions(+) >> >> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c >> index 6a576ff92c39..e2759ef73b03 100644 >> --- a/net/ipv6/addrconf.c >> +++ b/net/ipv6/addrconf.c >> @@ -4032,6 +4032,12 @@ static void addrconf_dad_work(struct work_struct *w) >> >> rtnl_lock(); >> >> + /* check if device was taken down before this delayed work >> + * function could be canceled >> + */ >> + if (!(idev->if_flags & IF_READY)) >> + goto out; >> + >> spin_lock_bh(&ifp->lock); >> if (ifp->state == INET6_IFADDR_STATE_PREDAD) { >> action = DAD_BEGIN; >> > > > Do we need to keep the test on IF_READY done later in this function ? > > If IF_READY can disappear only under RTNL, we might clean this. > > (unless addrconf_dad_work() releases rtnl and reacquires it)
Unless I am missing something none of the functions called by dad_work release the rtnl, but your comment did have me second guessing the locking. The interesting cases for changing the idev flag are addrconf_notify (NETDEV_UP and NETDEV_CHANGE) and addrconf_ifdown (reset the flag). The former does not have the idev lock - only rtnl. The latter has both. Checking the flag is inconsistent with respect to locks. As for your suggestion, the 'dead' flag is set only under rtnl in addrconf_ifdown and it means the device is getting removed (or IPv6 is disabled). Based on that I think the existing: if (idev->dead || !(idev->if_flags & IF_READY)) goto out; can be moved to right after the rtnl_lock in addrconf_dad_work in place of the above change, so the end result is: diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index 6a576ff92c39..dd3be06d5a06 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -4032,6 +4032,12 @@ static void addrconf_dad_work(struct work_struct *w) rtnl_lock(); + /* check if device was taken down before this delayed work + * function could be canceled + */ + if (idev->dead || !(idev->if_flags & IF_READY)) + goto out; + spin_lock_bh(&ifp->lock); if (ifp->state == INET6_IFADDR_STATE_PREDAD) { action = DAD_BEGIN; @@ -4077,11 +4083,6 @@ static void addrconf_dad_work(struct work_struct *w) goto out; write_lock_bh(&idev->lock); - if (idev->dead || !(idev->if_flags & IF_READY)) { - write_unlock_bh(&idev->lock); - goto out; - } - spin_lock(&ifp->lock); if (ifp->state == INET6_IFADDR_STATE_DEAD) { spin_unlock(&ifp->lock); agree?