Public bug reported:

[Impact]

Hibernation is still unreliable on c5.18xlarge instances, usually the
system hibernates correctly, but on resume it either perfoms a regular
reboot instead of resuming from hibernation, or the system is completely
stuck after the hibernated kernel is loaded in memory (more exactly the
system is stuck when the resume callbacks of the hibernated kernel are
executed).

[Test plan]

Create a c5.18xlarge instance, run the memory stress test script (the
same test script that we are using to stress test hibernation), trigger
the hibernate event, trigger the resume event. Repeat a couple of times
and the problem is very likely to happen.

[Fix]

Amazon pointed out two fixes that should address both issues:
1) upstream patch "PM: hibernate: flush swap writer after marking": this 
prevents the regular reboot issue, because it ensures that the I/O is always 
flushed after, not before, writing the hibernation signature

2) we need to reserve more space for HVC_BOOT_ARRAY_SIZE: this is a
temporary solution (SAUCE PATCH for now), suggested by Amazon, they are
working on a proper (more elegant) fix, but doubling the size of
HVC_BOOT_ARRAY_SIZE seems to resolve the problem, we have tested this
change extensively in the AWS cloud and it seems to prevent the "system
stuck on resume" issue from happening

[Regression potential]

The first patch is touching only the hibernation code, so potential
regressions could be experienced only in the hibernation scenario. The
second patch is more like a hack at the moment and it's affecting
kvmclock. Increasing the size of HVC_BOOT_ARRAY_SIZE could potentially
introduce regressions on small sized kvm systems and a better solution
would be to allocate the array hv_clock_boot dynamically. And this is
actually the proper fix that Amazon is currently working on. When the
fix will be published upstream we should apply that one and drop this
SAUCE PATCH.

** Affects: linux-aws (Ubuntu)
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1918694

Title:
  aws: fix hibernation issues on c5.18xlarge

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-aws/+bug/1918694/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to