On 5 Oct 2021 at 18:58, Parnell Springmeyer wrote: > > Hi, we use QEMU VMs for running our integration testing > infrastructure and have run into a very difficult to debug problem: > occasionally we will see a severe performance degradation in some of > our QEMU VMs. > If memory serves, QEMU guests appear to run as processes in the Linux host instance. I'm not "in the know enough" to tell you, how much is possibly happening under the hood in the kernel support side of things, which is potentially not well described by that superficial abstraction visible in "top".
Esoteric issues aside (CPU arch incompatibilities between host and guest), have you tried inspecting what the load looks like, in the guest and in the host OS instance? What does "top" show? With CPU cores expanded? (press "1") Have you tried "latencytop" by any chance? Are you sure this is a CPU performance/emulation issue? What storage are your VM's using? Could storage be the bottleneck? Isn't the observed "sluggishness" storage-io-bound, rather than CPU bound? Can you tell the difference? (Heck... apologies, that's probably a series of dumb questions to someone @arista.com) Stuff can get sluggish when IRQ's don't work right. Any signs of that in the guest instance? Interesting messages in dmesg, interesting numbers in /proc/interrupts? CPU arch emulation issues (guest vs. host) might also be an issue. If you specify a different CPU core for the guest than the host actually has, you may get some fringe parts of the instruction set, even within the x86_64 family, that needs to be tediously emulated for the guest instance... also, I'd hazard a guess 32bit vs. 64bit *might* play a role, albeit marginal. I have fond memories of the 387 math co-processor emulation (and its effects on program runtime), but that's a *long* time ago :-) I've seen EXT3 and EXT4 hang for no apparent reason, on bare metal, under heavy IOps stress. CPU consumption at 0%, disk IOps at pure 0, but the filesystem would block forever in a standstill. If I recall correctly, I used Bonnie++ to generate that kind of stress reproducibly, against fast block storage (HW RAID back then). There was no QEMU in the game. = feel free to add some juicy detail for us to ponder :-) Frank