A couple more points: How many CPU's (sockets) does your motherboard have? Multi-socket machines are more or less in the NUMA territory. Suboptimal process scheduling / memory allocation "decisions" (on part of the CPU/process scheduler) can have "interesting effects". Think of processes migrating between NUMA nodes... (a handful of CPU cores coupled to a local memory controller) repeatedly - yummy. Anti-patterns like this should not really happen though. A possible keyword here is the CPU-process *affinity*.
Next, I may mention the emulated peripherals = hardware other than the CPU instruction set. The VGA adaptor, NIC's, storage controllers... needless to say, I've never seen a problem with these, matching your problem description. If an emulated peripheral doesn't work for your guest OS, generally it hangs during boot already = it just doesn't work at all. If it does work, it tends to be blazing fast and efficient, compared to historical real hardware :-) Speaking of "unexpected performance degradations" makes me think of memory garbage-collection runs. Typically I'd expect this in modern "interpreted" programming languages (runtime environments of those). Java, .NET, probably Python as well - although not all garbage-collection mechanisms do periodic cleanup. Rather, GC based on reference-counting works continuously as the references are being created and removed by the user program running... In theory, something along the lines of GC can happen in the kernel too, in the "virtual memory management" department - it's called "compaction". https://www.kernel.org/doc/html/latest/admin-guide/mm/concepts.html#co mpaction https://pingcap.com/blog/linux-kernel-vs-memory-fragmentation-2 Compaction should result in faster "page table lookups", by decreasing the fragmentation of the mapping between physical RAM and virtual memory (as allocated to user-space processes and the kernel itself). If a "compaction run" gets triggered, it's hard for me to tell how long this can take to finish. Remember that it takes place completely in the RAM. It shouldn't be nearly as bad as e.g. disk IO stalling due to insufficient IOps, or thermal throttling. How to know that it's compaction, hampering your performance, while this is going on? Hmm. I'd look at the current CPU consumption on part of kcompactd : https://lwn.net/Articles/817905/ https://lore.kernel.org/lkml/20190126200005.GB27513@amd/T/ Now... how this works in a virtualized environment, that's a good question to me :-) If the whole virtual memory allocation clockwork (multiple layers of page tables) and the compaction "GC" works in two layers, once for the host and once for the guest? Is there possibly some host-to-guest coordination? But that would break the rules of virtualization, right? Or, does the host just allocate a single "huge page" to the guest anyway, and not care anymore? Does it actually? I recall earlier debates about sparse allocation / overprovisioning (host to guest) and ballooning and all that jazz... Good questions. Maybe look at swapping activity and kcompactd CPU consumption in the host instance and in the guest instances, separately? What about swapping, triggered by a dynamically occuring low memory condition in the guest VM? That would show up in the disk IO stats (iops skyrocketing). Speaking of disk IO, back in the heyday of RAID built on top of spinning rust, I remember instances where an individual physical drive in a RAID would start to struggle. In our RAID boxes, that had an activity LED (and a failure LED) per physical drive, this would show quite clearly: while the struggling drive hasn't died yet, the whole RAID would merely slow down (noticeably), individual healthy disk drives would just blink their activity LED every now and then, but the culprit drive's LED would indicate busy activity :-) This pattern under heavy sequential load (synthetic/deliberate), intended to use whatever bandwidth the RAID has to offer. And that's about it for now... :-) Frank