On Thu, May 12, 2022 at 09:41:29AM +0200, Paolo Bonzini wrote: > On 5/11/22 18:54, Daniel P. Berrangé wrote: > > On Wed, May 11, 2022 at 01:07:47PM +0200, Paolo Bonzini wrote: > > > On 5/11/22 12:10, Daniel P. Berrangé wrote: > > > > I expect creating/deleting I/O threads is cheap in comparison to > > > > the work done for preallocation. If libvirt is using -preconfig > > > > and object-add to create the memory backend, then we could have > > > > option of creating the I/O threads dynamically in -preconfig mode, > > > > create the memory backend, and then delete the I/O threads again. > > > > > > I think this is very overengineered. Michal's patch is doing the obvious > > > thing and if it doesn't work that's because Libvirt is trying to > > > micromanage > > > QEMU. > > > > Calling it micromanaging is putting a very negative connotation on > > this. What we're trying todo is enforce a host resource policy for > > QEMU, in a way that a compromised QEMU can't escape, which is a > > valuable protection. > > I'm sorry if that was a bit exaggerated, but the negative connotation was > intentional. > > > > As mentioned on IRC, if the reason is to prevent moving around threads in > > > realtime (SCHED_FIFO, SCHED_RR) classes, that should be fixed at the > > > kernel > > > level. > > > > We use cgroups where it is available to us, but we don't always have > > the freedom that we'd like. > > I understand. I'm thinking of a new flag to sched_setscheduler that fixes > the CPU affinity and policy of the thread and prevents changing it in case > QEMU is compromised later. The seccomp/SELinux sandboxes can prevent > setting the SCHED_FIFO class without this flag. > > In addition, my hunch is that this works only because the RT setup of QEMU > is not safe against priority inversion. IIRC the iothreads are set with a > non-realtime priority, but actually they should have a _higher_ priority > than the CPU threads, and the thread pool I/O bound workers should have an > even higher priority; otherwise you have a priority inversion situation > where an interrupt is pending that would wake up the CPU, but the iothreads > cannot process it because they have a lower priority than the CPU.
At least for RHEL deployments of KVM-RT, IIC the expectation is that the VCPUs with RT priority never do I/O, and that there is at least 1 additional non-RT vCPU from which the OS performs I/O. IOW, the RT VCPU works in a completely self contained manner with no interaction to any other QEMU threads. If that's not the case, then you would have to make sure those other threads have priority / schedular adjustments to avoid priority inversion With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|