** Changed in: linux-aws (Ubuntu Jammy) Status: In Progress => Fix Committed
** Changed in: linux-aws (Ubuntu Kinetic) Status: In Progress => Fix Committed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-aws in Ubuntu. https://bugs.launchpad.net/bugs/1968062 Title: Jammy / Kinetic: Enable Hibernation for Xen Based Instance Types Status in linux-aws package in Ubuntu: Fix Committed Status in linux-aws source package in Jammy: Fix Committed Status in linux-aws source package in Kinetic: Fix Committed Bug description: [Impact] Hibernation currently fails for all AWS Xen instance types (c3/c4/i3/m3/m4/r3/r4/t2) with all Jammy 5.15 and Kinetic 5.19 linux- aws kernels. When attempting to hibernate, the system gets stuck in sync_inodes_one_sb() when processing the rootfs, fails to hibernate, and shuts down. When you start the instance, it starts fresh, and does not resume from the incomplete hibernation image. Networking is also broken, and you cannot ssh in. Upon review of the jammy/linux-aws git log, it appears that the kernel is missing AWS hibernation enablement patches entirely. These need to be included to get hibernation working. [Fix] Hibernation currently works on the Amazon Linux 2 5.15 Kernel: https://github.com/amazonlinux/linux/tree/amazon-5.15.y/mainline After careful review of the amazon-5.15.y/mainline branch, we have found the below set of patches authored by Amazon AWS Hibernation team to be minimally sufficient to get hibernation working on both Jammy 5.15 and Kinetic 5.19. xen: Restore xen-pirqs on resume from hibernation xen-netfront: call netif_device_attach on resume xen: Only restore the ACPI SCI interrupt in xen_restore_pirqs. xen: restore pirqs on resume from hibernation. block: xen-blkfront: consider new dom0 features on restore x86: tsc: avoid system instability in hibernation xen-blkfront: Fixed blkfront_restore to remove a call to negotiate_mq Revert "xen: dont fiddle with event channel masking in suspend/resume" PM / hibernate: update the resume offset on SNAPSHOT_SET_SWAP_AREA x86/xen: close event channels for PIRQs in system core suspend callback xen/events: add xen_shutdown_pirqs helper function x86/xen: save and restore steal clock xen/time: introduce xen_{save,restore}_steal_clock xen-netfront: add callbacks for PM suspend and hibernation support xen-blkfront: add callbacks for PM suspend and hibernation x86/xen: add system core suspend and resume callbacks x86/xen: Introduce new function to map HYPERVISOR_shared_info on Resume xenbus: add freeze/thaw/restore callbacks support xen/manage: introduce helper function to know the on-going suspend mode xen/manage: keep track of the on-going suspend mode These patches will be carried as SAUCE patches, and their subjects marked with "UBUNTU: SAUCE [aws]". Their upstream is the Amazon Hibernation team, with the repo being the Amazon Linux 2 kernel repo. [Testcase] 1. Log into Amazon EC2. 2. Select Launch Instance. 3. Under Instance Type, select any from (c3/c4/i3/m3/m4/r3/r4/t2). I suggest t2.medium. 4. Select the "Ubuntu 22.04 LTS HVM (SSD type)" AMI in the quicklaunch pane. 5. Select your SSH keypair. 6. In storage, select 20gb. Go to the advanced tab, and set Encrypted: Yes. 7. Under Advanced Settings for the instance, set "Stop - Hibernate" to Enable. 8. Create the Instance. SSH in. 9. Wait 5 minutes for hibinit-agent to create /swap-hibinit swapfile and configure grub. 10. Start a screen session. Echo some text and then detach with ctrl-d. 11. Log out from instance. 12. In EC2, select "Instance State" > "Hibernate". 13. Wait 30 seconds to one minute. The state will go from "Stopping" to "Stopped". 14. Start the instance again. 15. SSH in. 16. Attempt to resume screen session with "screen -r". If you are not able to ssh into the instance, hibernation had failed. If ssh works and the screen session is still running, hibernation was successful. Alternatively, the CPC team can run their Hibernation testsuite over Jammy and Kinetic. We have built test kernels for Jammy and Kinetic with the patches, and they are available in the below ppa: https://launchpad.net/~gerald-yang-tw/+archive/ubuntu/aws-hibernate- test If you try and hibernate and resume with the test kernels, hibernation is successful. [Where problems could occur] We are adding a significant amount of code to the Xen subsystem, spread across many commits. This code has not been mainlined, and is instead maintained out of tree by the Amazon AWS Hibernation team. The changes target hibernation, block devices, and clock devices, specific to those used on AWS Xen instances. Most of these patches have been applied to Xenial, Bionic, Focal and other series for a long time, but some patches are new for 5.15 onward. The changes will only target linux-aws to try and limit regression risk to AWS users, and any regressions will be limited to users of Xen based instance types (c3/c4/i3/m3/m4/r3/r4/t2), covering both Xen 4.2 and Xen 4.11. If a regression were to occur, the instance would likely fail to hibernate, and at worst, write an incomplete hibernation image to the swapfile. The kernel will see this on start, and instead of resuming from the hibernation image, will start fresh. It is unlikely to cause any filesystem corruption on the rootfs, but any in progress computations at the time of hibernation could be lost. The current broken behaviour breaks networking, and users would have to power cycle the instance a few times before they can ssh in again. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-aws/+bug/1968062/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp