Stephen Hemminger <step...@networkplumber.org> writes: > These patches change how teardown of Hyper-V network devices > is done. These are tested on WS2012 and WS2016. > > It moves the tx/rx shutdown into the rndis close handling, > and that makes earlier gpadl changes unnecsssary. >
Thank you Stephen, I gave these a try and they didn't survive my 'death row' test on WS2016: I run 3 things in parallel: 1) iperf to some external IP 2) while true; do ethtool -L ethX combined 6; ethtool -L ethX combined 8; done 3) while true; do ip link set dev ethX mtu 1400; ip link set dev ethX mtu 1450; done I ended up with a hang: [ 1226.710034] INFO: task ip:2357 blocked for more than 120 seconds. [ 1226.712397] Not tainted 4.15.0-rc9+ #321 [ 1226.714030] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1226.716724] ip D 0 2357 1474 0x00000000 [ 1226.718698] Call Trace: [ 1226.719588] ? __schedule+0x1da/0x7b0 [ 1226.720910] ? get_page_from_freelist+0x106d/0x15c0 [ 1226.722648] schedule+0x28/0x80 [ 1226.723807] schedule_preempt_disabled+0xa/0x10 [ 1226.725952] __mutex_lock.isra.1+0x1a0/0x4e0 [ 1226.727915] ? rtnetlink_rcv_msg+0x212/0x2d0 [ 1226.729849] rtnetlink_rcv_msg+0x212/0x2d0 [ 1226.731611] ? rtnl_calcit.isra.28+0x110/0x110 [ 1226.733824] netlink_rcv_skb+0x4a/0x120 [ 1226.736916] netlink_unicast+0x19d/0x250 [ 1226.738907] netlink_sendmsg+0x2a5/0x3a0 [ 1226.740762] sock_sendmsg+0x30/0x40 [ 1226.742552] SYSC_sendto+0x10e/0x140 [ 1226.744310] ? __do_page_fault+0x26d/0x4c0 [ 1226.746332] entry_SYSCALL_64_fastpath+0x20/0x83 [ 1226.748730] RIP: 0033:0x7ff2cdc9aa7d [ 1226.750776] RSP: 002b:00007ffd0a3455e8 EFLAGS: 00000246 [ 1349.590041] INFO: task kworker/3:6:1586 blocked for more than 120 seconds. [ 1349.595358] Not tainted 4.15.0-rc9+ #321 [ 1349.597335] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1349.600638] kworker/3:6 D 0 1586 2 0x80000000 [ 1349.603335] Workqueue: ipv6_addrconf addrconf_verify_work [ 1349.605779] Call Trace: [ 1349.607080] ? __schedule+0x1da/0x7b0 [ 1349.608856] ? update_load_avg+0x563/0x6d0 [ 1349.610834] ? update_curr+0xb9/0x190 [ 1349.613050] schedule+0x28/0x80 [ 1349.615290] schedule_preempt_disabled+0xa/0x10 [ 1349.617306] __mutex_lock.isra.1+0x1a0/0x4e0 [ 1349.619072] ? addrconf_verify_work+0xa/0x20 [ 1349.621108] addrconf_verify_work+0xa/0x20 [ 1349.623107] process_one_work+0x188/0x380 [ 1349.625012] worker_thread+0x2e/0x390 [ 1349.626976] ? process_one_work+0x380/0x380 [ 1349.628925] kthread+0x111/0x130 [ 1349.630498] ? kthread_create_worker_on_cpu+0x70/0x70 [ 1349.632786] ? do_group_exit+0x3a/0xa0 [ 1349.634598] ret_from_fork+0x35/0x40 .... (I'm not 100% sure this is a _new_ issue btw, it can happen that the race was always there and it's just easier to trigger it now). I'll try to do more testing next week. Thanks, -- Vitaly