This bug was fixed in the package linux-aws - 5.4.0-1021.21 --------------- linux-aws (5.4.0-1021.21) focal; urgency=medium
* focal/linux-aws: 5.4.0-1021.21 -proposed tracker (LP: #1888811) * xen-netfront: potential deadlock in xennet_remove() (LP: #1888510) - SAUCE: xen-netfront: fix potential deadlock in xennet_remove() -- Stefan Bader <stefan.ba...@canonical.com> Fri, 24 Jul 2020 11:24:21 +0200 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-aws in Ubuntu. https://bugs.launchpad.net/bugs/1888510 Title: xen-netfront: potential deadlock in xennet_remove() Status in linux-aws package in Ubuntu: Invalid Status in linux-aws-5.3 package in Ubuntu: Invalid Status in linux-aws source package in Bionic: Incomplete Status in linux-aws-5.3 source package in Bionic: Fix Released Status in linux-aws source package in Focal: Fix Released Status in linux-aws-5.3 source package in Focal: Invalid Bug description: [Impact] During our AWS testing we were experiencing deadlocks on hibernate across all Xen instance types. The trace was showing that the system was stuck in xennet_remove(): [ 358.109087] Freezing of tasks failed after 20.006 seconds (1 tasks refusing to freeze, wq_busy=0): [ 358.115102] modprobe D 0 4892 4833 0x00004004 [ 358.115104] Call Trace: [ 358.115112] __schedule+0x2a8/0x670 [ 358.115115] schedule+0x33/0xa0 [ 358.115118] xennet_remove+0x1f0/0x230 [xen_netfront] [ 358.115121] ? wait_woken+0x80/0x80 [ 358.115124] xenbus_dev_remove+0x51/0xa0 [ 358.115126] device_release_driver_internal+0xe0/0x1b0 [ 358.115127] driver_detach+0x49/0x90 [ 358.115129] bus_remove_driver+0x59/0xd0 [ 358.115131] driver_unregister+0x2c/0x40 [ 358.115132] xenbus_unregister_driver+0x12/0x20 [ 358.115134] netif_exit+0x10/0x7aa [xen_netfront] [ 358.115137] __x64_sys_delete_module+0x146/0x290 [ 358.115140] do_syscall_64+0x5a/0x130 [ 358.115142] entry_SYSCALL_64_after_hwframe+0x44/0xa9 This prevented hibernation to complete. The reason of this problem is a race condition in xennet_remove(): the system is reading the current state of the bus, it's requesting to change the state to "Closing", and it's waiting for the state to be changed to "Closing". However, if the state becomes "Closed" between reading the state and requesting the state change, we are stuck forever, because the state will never change from "Closed" back to "Closing". [Test case] Create any Xen-based instance in AWS, hibernate/resume multiple times. Some times the system gets stuck (hung task timeout). [Fix] Prevent the deadlock by changing the wait condition to check also for state == Closed. [Regression potential] Minimal, this change affects only Xen, more exactly only the xen- netfront driver. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-aws/+bug/1888510/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp