On Fri, 2017-06-09 at 07:27 -0600, David Ahern wrote: > On 6/8/17 11:55 PM, Cong Wang wrote: > > On Thu, Jun 8, 2017 at 2:27 PM, Ben Greear <gree...@candelatech.com> wrote: > >> > >> As far as I can tell, the patch did not help, or at least we still > >> reproduce > >> the > >> crash easily. > > > > netlink dump is serialized by nlk->cb_mutex so I don't think that > > patch makes any sense w.r.t race condition. > > From what I can see fn_sernum should be accessed under table lock, so > when saving and checking it during a walk make sure it the lock is held. > That has nothing to do with the netlink dump, but the table changing > during a walk.
Yes, your patch makes total sense, of course. > > > >> (gdb) l *(fib6_walk_continue+0x76) > >> 0x188c6 is in fib6_walk_continue > >> (/home/greearb/git/linux-2.6/net/ipv6/ip6_fib.c:1593). > >> 1588 if (fn == w->root) > >> 1589 return 0; > >> 1590 pn = fn->parent; > >> 1591 w->node = pn; > >> 1592 #ifdef CONFIG_IPV6_SUBTREES > >> 1593 if (FIB6_SUBTREE(pn) == fn) { > > > > Apparently fn->parent is NULL here for some reason, but > > I don't know if that is expected or not. If a simple NULL check > > is not enough here, we have to trace why it is NULL. > > From my understanding, parent should not be null hence the attempts to > fix access to table nodes under a lock. ie., figuring out why it is null > here.