________________________________________
> From: Mahesh Bandewar (महेश बंडेवार) <mahe...@google.com>
> Sent: Friday, April 21, 2017 12:23 PM
> To: Ghalam, Joe
> Cc: herb...@gondor.apana.org.au; David Miller; Wichmann, Clifford; 
> linux-netdev
> Subject: Re: macvlan: Fix device ref leak when purging bc_queue

> May be the system is busy and snapshot is too small, and eventually
> process_broadcast() should get called. Deleting a slave does nothing
> about cancelling the work-queue so it would happen eventually.

> The change that Herbert proposed is correct. When packets are enqueued
> for processing later a dev reference is taken and it's removed when
> it's processed when it gets scheduled. The backlog is per port so it
> makes sense to remove reference(s) before purging the queue prior to
> deleting the port.

I only included the snapshot of the logs that's relevant. The system in 
question has been left in that state for hours, without ever seeing 
process_broadcast() being called. And, yes I did check the cpu load, and the 
system was running at around 20% load. So, I don't think that's the case. I 
would suggest to take closer look at the code in mtacvlan_dellink(), where it 
performs unlink and unregister:

void macvlan_dellink(struct net_device *dev, struct list_head *head)
{
        struct macvlan_dev *vlan = netdev_priv(dev);
        list_del_rcu(&vlan->list);
        unregister_netdevice_queue(dev, head);
        netdev_upper_dev_unlink(vlan->lowerdev, dev);
}

As I stated in my reply to Herbert initially, the code change he suggested is 
correct and needed, but not enough. We have tested with his code change and 
observed the same behavior. I can guarantee you that the code change to 
macvlan_port_destroy() has no effect on this issue, since the function 
macvlan_port_destroy () is not even called during the operation. 

Here is the forced stack trace that I caused to show the removal call:
Apr 20 06:23:40 OS10 kernel:  [<ffffffff810d312c>] 
__netdev_adjacent_dev_remove+0x3c/0x1a0
Apr 20 06:23:40 OS10 kernel:  [<ffffffff81bb6e87>] 
__netdev_adjacent_dev_unlink_lists+0x67/0x69
Apr 20 06:23:40 OS10 kernel:  [<ffffffff810d32a0>] 
__netdev_adjacent_dev_unlink+0x82/0x40
Apr 20 06:23:40 OS10 kernel:  [<ffffffff811d31e0>] 
netdev_upper_dev_unlink+0x10/0x20
Apr 20 06:23:40 OS10 kernel:  [<ffffffff8180e770>] macvlan_dellink+0x50/0x130
Apr 20 06:23:40 OS10 kernel:  [<ffffffff81a2ca27>] rtnl_dellink+0xb7/0x120
Apr 20 06:23:40 OS10 kernel:  [<ffffffff81a609ab>] ? 
__netlink_ns_capable+0x3b/0x40
Apr 20 06:23:40 OS10 kernel:  [<ffffffff81a2a6c5>] rtnetlink_rcv_msg+0x95/0x250
Apr 20 06:23:40 OS10 kernel:  [<ffffffff811c1499>] ? zone_statistics+0x89/0xa0
Apr 20 06:23:40 OS10 kernel:  [<ffffffff81a0a9de>] ? __alloc_skb+0x7e/0x2a0
Apr 20 06:23:40 OS10 kernel:  [<ffffffff81a2a630>] ? rtnetlink_rcv+0x30/0x30
Apr 20 06:23:40 OS10 kernel:  [<ffffffff81a64f59>] netlink_rcv_skb+0xa9/0xc0
Apr 20 06:23:40 OS10 kernel:  [<ffffffff81a2a628>] rtnetlink_rcv+0x28/0x30
Apr 20 06:23:40 OS10 kernel:  [<ffffffff81a64603>] netlink_unicast+0xf3/0x200
Apr 20 06:23:40 OS10 kernel:  [<ffffffff81a64a1e>] netlink_sendmsg+0x30e/0x680
Apr 20 06:23:40 OS10 kernel:  [<ffffffff81a014fb>] sock_sendmsg+0x8b/0xc0
Apr 20 06:23:40 OS10 kernel:  [<ffffffff81a011ee>] ? 
move_addr_to_kernel.part.18+0x1e/0x60
Apr 20 06:23:40 OS10 kernel:  [<ffffffff81a01ff1>] ? 
move_addr_to_kernel+0x21/0x30
Apr 20 06:23:40 OS10 kernel:  [<ffffffff81a018f6>] ___sys_sendmsg+0x376/0x390
Apr 20 06:23:40 OS10 kernel:  [<ffffffff81a0019f>] ? 
sock_destroy_inode+0x2f/0x40
Apr 20 06:23:40 OS10 kernel:  [<ffffffff810a161c>] ? __do_page_fault+0x20c/0x560
Apr 20 06:23:40 OS10 kernel:  [<ffffffff812279ad>] ? dput+0xad/0x180
Apr 20 06:23:40 OS10 kernel:  [<ffffffff81230a74>] ? mntput+0x24/0x40
Apr 20 06:23:40 OS10 kernel:  [<ffffffff81212a50>] ? __fput+0x190/0x220
Apr 20 06:23:40 OS10 kernel:  [<ffffffff81a026b2>] __sys_sendmsg+0x42/0x80
Apr 20 06:23:40 OS10 kernel:  [<ffffffff81a02702>] SyS_sendmsg+0x12/0x20
Apr 20 06:23:40 OS10 kernel:  [<ffffffff81bc86cd>] 
system_call_fast_compare_end+0x10/0x15

Reply via email to