Karl Berry wrote: > iotop -b -o -n 5 > > Thanks. I think -n 1 is better; -n 5 is too much information.
That is fine. I looked at -n1 and it looked like it may have been missing processes that were listed in one of the subsequent passes. That is why I added another few iterations to save. But to be honest I was simply trying to keep moving and wasn't looking too closely. > But the 7 cpus look mostly idle. > > I think it is almost certain that the underlying problem is disk I/O. > That's been clear to me for quite a while, since the top processes are > almost always in disk wait, as I said. I now see that the dom0 is a 8-cpu host but is hosting 24 VMs. It isn't overbooked on ram as far as I can tell. But if several of those VMs all became active at around the same time then that would saturate the underlying host hardware. I/O saturation would line up with your correlation of disk I/O being problematic. But we really need to look at the entire set of VMs across the underlying dom0 host. Having visibility into some of the individual VMs has definitely made diagnosing this more difficult. > What's not clear is how to resolve it, given that updating hardware > is a massive, expensive, and unlikely-to-happen project. Agreed. If all things were possible I would suggest adding more hardware. Now that I know it is one 8-cpu host I think we are probably overloading the available hardware resource. We should move services off of the current vcs.sv VM and onto different additional hardware. I will just stop there for the moment because I agree that does escalate into a serious project. Because I wouldn't want to add one single machine. I would want to add at least two such that there is redundant hardware in the case of a failure. It would need to be thought about in order to support this much infrastructure. Bob