(cleaned the post up for readability, sorry about that)

On 01/25/2018 05:40 PM, Hans van Kranenburg wrote:
This means that your vcpus want to execute work but are not being
scheduled on a physical cpu core. Either the physical machine gets too
much work from all the virtual machines that are requesting cpu time, or
other things are going on, like your virtual machine getting paused
(e.g. when doing live migration there's a handover moment when it's
shortly paused and then resumed, this is also visible as a short 100%
steal spike).

After going over log files, it appears that the issue started when Amazon did a live migration of the VM, probably for the Meltdown patching.

A patch to fix that cpu accounting breakage (picked from linux 4.15) was
included in 4.9.65-3. So only for the 4.9.0-3 (which actual version?)
you could be seeing that one happening.

The versions were both 4.9.30-2+deb9u2 and the latest, 4.9.65-3+deb9u2.  So basically the kernel never recovered properly after being paused during a live migration.

Because of the mentioned steal time fix that was included in a version
in between the 2 versions you mention, my first suggestion would be to
see if the symptoms on the old and new kernel are exactly the same, or
if they are only similar but different.

Hans

I already rebooted the system running the 4.9.65 kernel, and beforehand, the symptoms were the same.  The CPU usage stats went back to normal after the reboot.

--
Ryan Thoryk
r...@thoryk.com
r...@tliquest.net

Reply via email to