On Wed, Sep 20, 2017 at 10:55 AM, Paweł Staszewski <pstaszew...@itcare.pl> wrote: > > > W dniu 2017-09-20 o 19:50, Cong Wang pisze: > > On Wed, Sep 20, 2017 at 6:11 AM, Eric Dumazet <eric.duma...@gmail.com> > wrote: > > Sorry for top-posting, but this is to give context to Wei, since Pawel > used a top posting way to report his bisection. > > Wei, can you take a look at Pawel report ? > > Crash happens in dst_destroy() at following : > > if (dst->dev) > dev_put(dst->dev); <<CRASH>> > > > dst->dev is not NULL, but netdev->pcpu_refcnt is NULL > > 65 ff 08 decl %gs:(%rax) // CRASH since rax = NULL > > > > Pawel, please share your netdevices and routing setup ? > > Looks like a double dev_put() on some dev... > > Pawel, do you have any idea how this is triggered? Does your > test try to remove some network device? If so which one? > I noticed you have at least multiple vlan, bond and ixgbe > devices. > > Just after i start bgp sessions > So when host is starting i have all bgp sessions to upstreams shutdown > > To trigger panic i just enable all 6x bgp sessions at once to upstreams - > and zebra is start to pull prefixes and push them to the kernel > > Then some traffic is generated from test hosts thru this backup router and > panic is generated - every time after 10 to 15 seconds after bgp sessions > are connected. > > I'm not removing any interface at this time or do anything with interfaces - > just wait. > > And yes there are vlans attached to the bond devices > but dmesg at this time shows nothing about interfaces or flaps.
This is very odd. We only free netdevice in free_netdev() and it is only called when we unregister a netdevice. Otherwise pcpu_refcnt is impossible to be NULL.