I built a kernel with ref tracking enabled and reproduced the issue. I'm
not certain why these refs are still held, it seems to indicate the
device is still active in some capacity such that routing info hasn't
been released.

I was also able to reproduce the issue on stable branches linux-6.1.y
and linux-6.5.y. So far I haven't found a version unaffected.

[  155.941715] unregister_netdevice: waiting for veth_A-R1 to become free. 
Usage count = 5
[  155.949918] ref_tracker: veth_A-R1@ffff0000b894a570 has 1/4 users at
                    dst_init+0x94/0x198
                    dst_alloc+0x68/0xc8
                    ip6_dst_alloc+0x30/0xc8
                    ip6_pol_route+0x2b8/0x4b8
                    ip6_pol_route_output+0x2c/0x60
                    fib6_rule_lookup+0x114/0x258
                    ip6_route_output_flags+0x100/0x248
                    ip6_update_pmtu+0xb0/0x150
                    ip6_tnl_err.isra.0+0x184/0x2d8 [ip6_tunnel]
                    ip6ip6_err+0x64/0x1e0 [ip6_tunnel]
                    tunnel6_err+0x78/0x108 [tunnel6]
                    icmpv6_notify+0xe4/0x268
                    icmpv6_rcv+0x1d8/0x6b0
                    ip6_protocol_deliver_rcu+0xac/0x530
                    ip6_input_finish+0x54/0xa0
                    ip6_input+0x54/0x110

[  156.028335] ref_tracker: veth_A-R1@ffff0000b894a570 has 1/4 users at
                    dst_init+0x94/0x198
                    dst_alloc+0x68/0xc8
                    ip6_dst_alloc+0x30/0xc8
                    ip6_pol_route+0x2b8/0x4b8
                    ip6_pol_route_output+0x2c/0x60
                    fib6_rule_lookup+0x114/0x258
                    ip6_route_output_flags+0x100/0x248
                    ip6_tnl_xmit+0x6b8/0xb40 [ip6_tunnel]
                    ip6_tnl_start_xmit+0x350/0x580 [ip6_tunnel]
                    dev_hard_start_xmit+0xa8/0x240
                    __dev_queue_xmit+0x1d8/0xf48
                    neigh_connected_output+0xc8/0x158
                    ip6_finish_output2+0x2dc/0x998
                    ip6_finish_output+0x228/0x400
                    ip6_output+0x88/0x208
                    NF_HOOK.constprop.0+0x58/0x118

[  156.108437] ref_tracker: veth_A-R1@ffff0000b894a570 has 1/4 users at
                    netdev_get_by_index+0x5c/0xc8
                    fib6_nh_init+0x1f0/0x8f8
                    rtm_new_nexthop+0x604/0x1540
                    rtnetlink_rcv_msg+0x148/0x410
                    netlink_rcv_skb+0x6c/0x160
                    rtnetlink_rcv+0x24/0x50
                    netlink_unicast+0x328/0x398
                    netlink_sendmsg+0x270/0x468
                    __sock_sendmsg+0x80/0x108
                    ____sys_sendmsg+0x28c/0x348
                    ___sys_sendmsg+0xbc/0x140
                    __sys_sendmsg+0x94/0x120
                    __arm64_sys_sendmsg+0x30/0x60
                    invoke_syscall+0x7c/0x130
                    el0_svc_common.constprop.0+0x4c/0x140
                    do_el0_svc+0x28/0x58

[  156.185520] ref_tracker: veth_A-R1@ffff0000b894a570 has 1/4 users at
                    ipv6_add_dev+0x214/0x728
                    addrconf_notify+0x238/0x780
                    notifier_call_chain+0x80/0x168
                    raw_notifier_call_chain+0x24/0x58
                    call_netdevice_notifiers_info+0x64/0x100
                    register_netdevice+0x6bc/0x860
                    veth_newlink+0x1e8/0x4a0 [veth]
                    __rtnl_newlink+0x658/0x8d8
                    rtnl_newlink+0x94/0xe8
                    rtnetlink_rcv_msg+0x148/0x410
                    netlink_rcv_skb+0x6c/0x160
                    rtnetlink_rcv+0x24/0x50
                    netlink_unicast+0x328/0x398
                    netlink_sendmsg+0x270/0x468
                    __sock_sendmsg+0x80/0x108
                    ____sys_sendmsg+0x28c/0x348

-- 
You received this bug notification because you are a member of Canonical
Platform QA Team, which is subscribed to ubuntu-kernel-tests.
https://bugs.launchpad.net/bugs/2072501

Title:
  net:pmtu.sh of ubuntu_kselftests_net and subsequent subtests time out

Status in ubuntu-kernel-tests:
  New

Bug description:
  Found on N-nvidia-64k with node hinyari. After net:pmtu.sh times out,
  these tests also timeout until "TEST SYSTEM FAILURE DETECTED" is
  eventually triggered: net:udpgso.sh, net:ip_defrag.sh,
  net:udpgso_bench.sh, net:fib_rule_tests.sh.

  14:44:14 INFO |       START   ubuntu_kselftests_net.net:pmtu.sh       
ubuntu_kselftests_net.net:pmtu.sh       timeout=2700    timestamp=1719413054    
localtime=Jun 26 14:44:14       
  14:44:14 DEBUG| Persistent state client._record_indent now set to 2
  14:44:14 DEBUG| Persistent state client.unexpected_reboot now set to 
('ubuntu_kselftests_net.net:pmtu.sh', 'ubuntu_kselftests_net.net:pmtu.sh')
  14:44:14 DEBUG| Waiting for pid 43936 for 2700 seconds
  14:44:14 WARNI| System python is too old, crash handling disabled
  14:44:14 DEBUG| Running 'make run_tests -C net TEST_PROGS=pmtu.sh 
TEST_GEN_PROGS='' TEST_CUSTOM_PROGS='''
  14:44:14 DEBUG| [stdout] make: Entering directory 
'/home/ubuntu/autotest/client/tmp/ubuntu_kselftests_net/src/linux/tools/testing/selftests/net'
  14:44:14 DEBUG| [stdout] TAP version 13
  14:44:14 DEBUG| [stdout] 1..1
  14:44:14 DEBUG| [stdout] # timeout set to 0
  14:44:14 DEBUG| [stdout] # selftests: net: pmtu.sh
  14:44:14 DEBUG| [stdout] # TEST: ipv4: PMTU exceptions                        
                 [ OK ]
  14:44:14 DEBUG| [stdout] # TEST: ipv4: PMTU exceptions - nexthop objects      
                 [ OK ]
  14:44:14 DEBUG| [stdout] # TEST: ipv6: PMTU exceptions                        
                 [ OK ]
  14:44:14 DEBUG| [stdout] # TEST: ipv6: PMTU exceptions - nexthop objects      
                 [ OK ]
  14:44:14 DEBUG| [stdout] # TEST: ICMPv4 with DSCP and ECN: PMTU exceptions    
                 [ OK ]
  14:44:15 DEBUG| [stdout] # TEST: ICMPv4 with DSCP and ECN: PMTU exceptions - 
nexthop objects   [ OK ]
  14:44:15 DEBUG| [stdout] # 2024/06/26 14:44:15 socat[44926] E 
xioopen_ipdgram_listen(): unknown address family 0
  14:44:15 DEBUG| [stdout] # TEST: UDPv4 with DSCP and ECN: PMTU exceptions     
                 [ OK ]
  14:44:15 DEBUG| [stdout] # ./pmtu.sh: line 937: kill: (44926) - No such 
process
  14:44:15 DEBUG| [stdout] # 2024/06/26 14:44:15 socat[45066] E 
xioopen_ipdgram_listen(): unknown address family 0
  14:44:15 DEBUG| [stdout] # TEST: UDPv4 with DSCP and ECN: PMTU exceptions - 
nexthop objects    [ OK ]
  14:44:15 DEBUG| [stdout] # ./pmtu.sh: line 937: kill: (45066) - No such 
process
  14:44:16 DEBUG| [stdout] # TEST: IPv4 over vxlan4: PMTU exceptions            
                 [ OK ]
  14:44:17 DEBUG| [stdout] # TEST: IPv4 over vxlan4: PMTU exceptions - nexthop 
objects           [ OK ]
  14:44:18 DEBUG| [stdout] # TEST: IPv6 over vxlan4: PMTU exceptions            
                 [ OK ]
  14:44:19 DEBUG| [stdout] # TEST: IPv6 over vxlan4: PMTU exceptions - nexthop 
objects           [ OK ]
  14:44:20 DEBUG| [stdout] # TEST: IPv4 over vxlan6: PMTU exceptions            
                 [ OK ]
  14:44:22 DEBUG| [stdout] # TEST: IPv4 over vxlan6: PMTU exceptions - nexthop 
objects           [ OK ]
  14:44:23 DEBUG| [stdout] # TEST: IPv6 over vxlan6: PMTU exceptions            
                 [ OK ]
  14:44:24 DEBUG| [stdout] # TEST: IPv6 over vxlan6: PMTU exceptions - nexthop 
objects           [ OK ]
  14:44:25 DEBUG| [stdout] # TEST: IPv4 over geneve4: PMTU exceptions           
                 [ OK ]
  14:44:26 DEBUG| [stdout] # TEST: IPv4 over geneve4: PMTU exceptions - nexthop 
objects          [ OK ]
  14:44:27 DEBUG| [stdout] # TEST: IPv6 over geneve4: PMTU exceptions           
                 [ OK ]
  14:44:28 DEBUG| [stdout] # TEST: IPv6 over geneve4: PMTU exceptions - nexthop 
objects          [ OK ]
  14:44:30 DEBUG| [stdout] # TEST: IPv4 over geneve6: PMTU exceptions           
                 [ OK ]
  14:44:31 DEBUG| [stdout] # TEST: IPv4 over geneve6: PMTU exceptions - nexthop 
objects          [ OK ]
  14:44:32 DEBUG| [stdout] # TEST: IPv6 over geneve6: PMTU exceptions           
                 [ OK ]
  14:44:33 DEBUG| [stdout] # TEST: IPv6 over geneve6: PMTU exceptions - nexthop 
objects          [ OK ]
  14:44:35 DEBUG| [stdout] # TEST: IPv4, bridged vxlan4: PMTU exceptions        
                 [ OK ]
  14:44:37 DEBUG| [stdout] # TEST: IPv4, bridged vxlan4: PMTU exceptions - 
nexthop objects       [ OK ]
  14:44:40 DEBUG| [stdout] # TEST: IPv6, bridged vxlan4: PMTU exceptions        
                 [ OK ]
  14:44:42 DEBUG| [stdout] # TEST: IPv6, bridged vxlan4: PMTU exceptions - 
nexthop objects       [ OK ]
  14:44:44 DEBUG| [stdout] # TEST: IPv4, bridged vxlan6: PMTU exceptions        
                 [ OK ]
  14:44:46 DEBUG| [stdout] # TEST: IPv4, bridged vxlan6: PMTU exceptions - 
nexthop objects       [ OK ]
  14:44:49 DEBUG| [stdout] # TEST: IPv6, bridged vxlan6: PMTU exceptions        
                 [ OK ]
  14:44:51 DEBUG| [stdout] # TEST: IPv6, bridged vxlan6: PMTU exceptions - 
nexthop objects       [ OK ]
  14:44:53 DEBUG| [stdout] # TEST: IPv4, bridged geneve4: PMTU exceptions       
                 [ OK ]
  14:44:55 DEBUG| [stdout] # TEST: IPv4, bridged geneve4: PMTU exceptions - 
nexthop objects      [ OK ]
  14:44:58 DEBUG| [stdout] # TEST: IPv6, bridged geneve4: PMTU exceptions       
                 [ OK ]
  14:45:00 DEBUG| [stdout] # TEST: IPv6, bridged geneve4: PMTU exceptions - 
nexthop objects      [ OK ]
  14:45:02 DEBUG| [stdout] # TEST: IPv4, bridged geneve6: PMTU exceptions       
                 [ OK ]
  14:45:04 DEBUG| [stdout] # TEST: IPv4, bridged geneve6: PMTU exceptions - 
nexthop objects      [ OK ]
  14:45:07 DEBUG| [stdout] # TEST: IPv6, bridged geneve6: PMTU exceptions       
                 [ OK ]
  14:45:09 DEBUG| [stdout] # TEST: IPv6, bridged geneve6: PMTU exceptions - 
nexthop objects      [ OK ]
  14:45:09 DEBUG| [stdout] #   ovs_bridge not supported
  14:45:09 DEBUG| [stdout] # TEST: IPv4, OVS vxlan4: PMTU exceptions            
                 [SKIP]
  14:45:09 DEBUG| [stdout] #   ovs_bridge not supported
  14:45:09 DEBUG| [stdout] # TEST: IPv6, OVS vxlan4: PMTU exceptions            
                 [SKIP]
  14:45:09 DEBUG| [stdout] #   ovs_bridge not supported
  14:45:09 DEBUG| [stdout] # TEST: IPv4, OVS vxlan6: PMTU exceptions            
                 [SKIP]
  14:45:09 DEBUG| [stdout] #   ovs_bridge not supported
  14:45:09 DEBUG| [stdout] # TEST: IPv6, OVS vxlan6: PMTU exceptions            
                 [SKIP]
  14:45:09 DEBUG| [stdout] #   ovs_bridge not supported
  14:45:09 DEBUG| [stdout] # TEST: IPv4, OVS geneve4: PMTU exceptions           
                 [SKIP]
  14:45:10 DEBUG| [stdout] #   ovs_bridge not supported
  14:45:10 DEBUG| [stdout] # TEST: IPv6, OVS geneve4: PMTU exceptions           
                 [SKIP]
  14:45:10 DEBUG| [stdout] #   ovs_bridge not supported
  14:45:10 DEBUG| [stdout] # TEST: IPv4, OVS geneve6: PMTU exceptions           
                 [SKIP]
  14:45:10 DEBUG| [stdout] #   ovs_bridge not supported
  14:45:10 DEBUG| [stdout] # TEST: IPv6, OVS geneve6: PMTU exceptions           
                 [SKIP]
  14:45:11 DEBUG| [stdout] # TEST: IPv4 over fou4: PMTU exceptions              
                 [ OK ]
  14:45:11 DEBUG| [stdout] # TEST: IPv4 over fou4: PMTU exceptions - nexthop 
objects             [ OK ]
  14:45:11 DEBUG| [stdout] # TEST: IPv6 over fou4: PMTU exceptions              
                 [ OK ]
  14:45:12 DEBUG| [stdout] # TEST: IPv6 over fou4: PMTU exceptions - nexthop 
objects             [ OK ]
  14:45:12 DEBUG| [stdout] # TEST: IPv4 over fou6: PMTU exceptions              
                 [ OK ]
  14:45:12 DEBUG| [stdout] # TEST: IPv4 over fou6: PMTU exceptions - nexthop 
objects             [ OK ]
  14:45:12 DEBUG| [stdout] # TEST: IPv6 over fou6: PMTU exceptions              
                 [ OK ]
  14:45:13 DEBUG| [stdout] # TEST: IPv6 over fou6: PMTU exceptions - nexthop 
objects             [ OK ]
  14:45:13 DEBUG| [stdout] # TEST: IPv4 over gue4: PMTU exceptions              
                 [ OK ]
  14:45:13 DEBUG| [stdout] # TEST: IPv4 over gue4: PMTU exceptions - nexthop 
objects             [ OK ]
  14:45:13 DEBUG| [stdout] # TEST: IPv6 over gue4: PMTU exceptions              
                 [ OK ]
  14:45:14 DEBUG| [stdout] # TEST: IPv6 over gue4: PMTU exceptions - nexthop 
objects             [ OK ]
  14:45:14 DEBUG| [stdout] # TEST: IPv4 over gue6: PMTU exceptions              
                 [ OK ]
  14:45:14 DEBUG| [stdout] # TEST: IPv4 over gue6: PMTU exceptions - nexthop 
objects             [ OK ]
  14:45:14 DEBUG| [stdout] # TEST: IPv6 over gue6: PMTU exceptions              
                 [ OK ]
  14:45:14 DEBUG| [stdout] # TEST: IPv6 over gue6: PMTU exceptions - nexthop 
objects             [ OK ]
  14:45:15 DEBUG| [stdout] # TEST: IPv4 over IPv4: PMTU exceptions              
                 [ OK ]
  14:45:15 DEBUG| [stdout] # TEST: IPv4 over IPv4: PMTU exceptions - nexthop 
objects             [ OK ]
  14:45:15 DEBUG| [stdout] # TEST: IPv6 over IPv4: PMTU exceptions              
                 [ OK ]
  14:45:15 DEBUG| [stdout] # TEST: IPv6 over IPv4: PMTU exceptions - nexthop 
objects             [ OK ]
  14:45:15 DEBUG| [stdout] # TEST: IPv4 over IPv6: PMTU exceptions              
                 [ OK ]
  14:45:16 DEBUG| [stdout] # TEST: IPv4 over IPv6: PMTU exceptions - nexthop 
objects             [ OK ]
  14:45:16 DEBUG| [stdout] # TEST: IPv6 over IPv6: PMTU exceptions              
                 [ OK ]
  14:45:16 DEBUG| [stdout] # TEST: IPv6 over IPv6: PMTU exceptions - nexthop 
objects             [ OK ]
  15:29:14 INFO | Timer expired (2700 sec.), nuking pid 43936
  15:29:14 INFO |               ERROR   ubuntu_kselftests_net.net:pmtu.sh       
ubuntu_kselftests_net.net:pmtu.sh       timestamp=1719415754    localtime=Jun 
26 15:29:14       Test timeout expired, rc=15
  15:29:14 INFO |       END ERROR       ubuntu_kselftests_net.net:pmtu.sh       
ubuntu_kselftests_net.net:pmtu.sh       timestamp=1719415754    localtime=Jun 
26 15:29:14       
  15:29:14 DEBUG| Persistent state client._record_indent now set to 1
  15:29:14 DEBUG| Persistent state client.unexpected_reboot deleted
  15:29:14 DEBUG| Test has timeout: 2700 sec.
  15:29:14 INFO |       START   ubuntu_kselftests_net.net:udpgso.sh     
ubuntu_kselftests_net.net:udpgso.sh     timeout=2700    timestamp=1719415754    
localtime=Jun 26 15:29:14       
  15:29:14 DEBUG| Persistent state client._record_indent now set to 2
  15:29:14 DEBUG| Persistent state client.unexpected_reboot now set to 
('ubuntu_kselftests_net.net:udpgso.sh', 'ubuntu_kselftests_net.net:udpgso.sh')
  15:29:14 DEBUG| Waiting for pid 54956 for 2700 seconds
  15:29:14 WARNI| System python is too old, crash handling disabled
  15:29:14 DEBUG| Running 'make run_tests -C net TEST_PROGS=udpgso.sh 
TEST_GEN_PROGS='' TEST_CUSTOM_PROGS='''
  15:29:14 DEBUG| [stdout] make: Entering directory 
'/home/ubuntu/autotest/client/tmp/ubuntu_kselftests_net/src/linux/tools/testing/selftests/net'
  15:29:14 DEBUG| [stdout] TAP version 13
  15:29:14 DEBUG| [stdout] 1..1
  15:29:14 DEBUG| [stdout] # timeout set to 0
  15:29:14 DEBUG| [stdout] # selftests: net: udpgso.sh
  15:29:14 DEBUG| [stdout] # ipv4 cmsg
  16:14:14 INFO | Timer expired (2700 sec.), nuking pid 54956
  16:14:14 INFO |               ERROR   ubuntu_kselftests_net.net:udpgso.sh     
ubuntu_kselftests_net.net:udpgso.sh     timestamp=1719418454    localtime=Jun 
26 16:14:14       Test timeout expired, rc=15
  16:14:14 INFO |       END ERROR       ubuntu_kselftests_net.net:udpgso.sh     
ubuntu_kselftests_net.net:udpgso.sh     timestamp=1719418454    localtime=Jun 
26 
  ...

  
  Accompanied by the following dmesg entries, "waiting for veth_A-R1 to become 
free" appears to repeat until the system is shutdown:
  [  574.404587] unregister_netdevice: waiting for veth_A-R1 to become free. 
Usage count = 5
  [  584.652587] unregister_netdevice: waiting for veth_A-R1 to become free. 
Usage count = 5
  [  594.900587] unregister_netdevice: waiting for veth_A-R1 to become free. 
Usage count = 5
  [  605.148587] unregister_netdevice: waiting for veth_A-R1 to become free. 
Usage count = 5
  [  615.396587] unregister_netdevice: waiting for veth_A-R1 to become free. 
Usage count = 5
  [  625.644587] unregister_netdevice: waiting for veth_A-R1 to become free. 
Usage count = 5
  [  635.892587] unregister_netdevice: waiting for veth_A-R1 to become free. 
Usage count = 5
  [  646.140587] unregister_netdevice: waiting for veth_A-R1 to become free. 
Usage count = 5
  [  656.388587] unregister_netdevice: waiting for veth_A-R1 to become free. 
Usage count = 5
  [  666.636587] unregister_netdevice: waiting for veth_A-R1 to become free. 
Usage count = 5
  [  676.884586] unregister_netdevice: waiting for veth_A-R1 to become free. 
Usage count = 5
  [  687.132586] unregister_netdevice: waiting for veth_A-R1 to become free. 
Usage count = 5
  [  697.380587] unregister_netdevice: waiting for veth_A-R1 to become free. 
Usage count = 5
  [  707.628586] unregister_netdevice: waiting for veth_A-R1 to become free. 
Usage count = 5
  [  717.876586] unregister_netdevice: waiting for veth_A-R1 to become free. 
Usage count = 5
  [  728.124587] unregister_netdevice: waiting for veth_A-R1 to become free. 
Usage count = 5
  [  738.284631] INFO: task modprobe:54505 blocked for more than 122 seconds.
  [  738.291488]       Not tainted 6.8.0-1009-nvidia-64k #9-Ubuntu
  [  738.297364] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  738.372587] unregister_netdevice: waiting for veth_A-R1 to become free. 
Usage count = 5
  [  748.620586] unregister_netdevice: waiting for veth_A-R1 to become free. 
Usage count = 5
  ...

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/2072501/+subscriptions


-- 
Mailing list: https://launchpad.net/~canonical-ubuntu-qa
Post to     : canonical-ubuntu-qa@lists.launchpad.net
Unsubscribe : https://launchpad.net/~canonical-ubuntu-qa
More help   : https://help.launchpad.net/ListHelp

Reply via email to