Hi, We are using OpenStack for managing realtime guests. We modified it and contributed to discussions on how to model the realtime feature. More recent versions of OpenStack have support for realtime, and there are a few proposals on how to improve that further.
But there is still no full answer on how to distribute threads across host-cores. The vcpus are easy but for the emulation and io-threads there are multiple options. I would like to collect the constraints from a qemu/kvm perspective first, and than possibly influence the OpenStack development I will put the summary/questions first, the text below provides more context to where the questions come from. - How do you distribute your threads when reaching the really low cyclictest results in the guests? In [3] Rik talked about problems like hold holder preemption, starvation etc. but not where/how to schedule emulators and io - Is it ok to put a vcpu and emulator thread on the same core as long as the guest knows about it? Any funny behaving guest, not just Linux. - Is it ok to make the emulators potentially slow by running them on busy best-effort cores, or will they quickly be on the critical path if you do more than just cyclictest? - our experience says we don't need them reactive even with rt-networking involved Our goal is to reach a high packing density of realtime VMs. Our pragmatic first choice was to run all non-vcpu-threads on a shared set of pcpus where we also run best-effort VMs and host load. Now the OpenStack guys are not too happy with that because that is load outside the assigned resources, which leads to quota and accounting problems. So the current OpenStack model is to run those threads next to one or more vcpu-threads. [1] You will need to remember that the vcpus in question should not be your rt-cpus in the guest. I.e. if vcpu0 shares its pcpu with the hypervisor noise your preemptrt-guest would use isolcpus=1. Is that kind of sharing a pcpu really a good idea? I could imagine things like smp housekeeping (cache invalidation etc.) to eventually cause vcpu1 having to wait for the emulator stuck in IO. Or maybe a busy polling vcpu0 starving its own emulator causing high latency or even deadlocks. Even if it happens to work for Linux guests it seems like a strong assumption that an rt-guest that has noise cores can deal with even more noise one scheduling level below. More recent proposals [2] suggest a scheme where the emulator and io threads are on a separate core. That sounds more reasonable / conservative but dramatically increases the per VM cost. And the pcpus hosting the hypervisor threads will probably be idle most of the time. I guess in this context the most important question is whether qemu is ever involved in "regular operation" if you avoid the obvious IO problems on your critical path. My guess is that just [1] has serious hidden latency problems and [2] is taking it a step too far by wasting whole cores for idle emulators. We would like to suggest some other way inbetween, that is a little easier on the core count. Our current solution seems to work fine but has the mentioned quota problems. With this mail i am hoping to collect some constraints to derive a suggestion from. Or maybe collect some information that could be added to the current blueprints as reasoning/documentation. Sorry if you receive this mail a second time, i was not subscribed to openstack-dev the first time. best regards, Henning [1] https://specs.openstack.org/openstack/nova-specs/specs/mitaka/implemented/libvirt-real-time.html [2] https://specs.openstack.org/openstack/nova-specs/specs/ocata/approved/libvirt-emulator-threads-policy.html [3] http://events.linuxfoundation.org/sites/events/files/slides/kvmforum2015-realtimekvm.pdf __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev