Am Tue, 27 Jun 2017 09:25:14 -0600 schrieb Chris Friesen <chris.frie...@windriver.com>:
> On 06/27/2017 01:44 AM, Sahid Orentino Ferdjaoui wrote: > > On Mon, Jun 26, 2017 at 10:19:12AM +0200, Henning Schild wrote: > >> Am Sun, 25 Jun 2017 10:09:10 +0200 > >> schrieb Sahid Orentino Ferdjaoui <sferd...@redhat.com>: > >> > >>> On Fri, Jun 23, 2017 at 10:34:26AM -0600, Chris Friesen wrote: > >>>> On 06/23/2017 09:35 AM, Henning Schild wrote: > >>>>> Am Fri, 23 Jun 2017 11:11:10 +0200 > >>>>> schrieb Sahid Orentino Ferdjaoui <sferd...@redhat.com>: > >>>> > >>>>>> In Linux RT context, and as you mentioned, the non-RT vCPU can > >>>>>> acquire some guest kernel lock, then be pre-empted by emulator > >>>>>> thread while holding this lock. This situation blocks RT vCPUs > >>>>>> from doing its work. So that is why we have implemented [2]. > >>>>>> For DPDK I don't think we have such problems because it's > >>>>>> running in userland. > >>>>>> > >>>>>> So for DPDK context I think we could have a mask like we have > >>>>>> for RT and basically considering vCPU0 to handle best effort > >>>>>> works (emulator threads, SSH...). I think it's the current > >>>>>> pattern used by DPDK users. > >>>>> > >>>>> DPDK is just a library and one can imagine an application that > >>>>> has cross-core communication/synchronisation needs where the > >>>>> emulator slowing down vpu0 will also slow down vcpu1. You DPDK > >>>>> application would have to know which of its cores did not get a > >>>>> full pcpu. > >>>>> > >>>>> I am not sure what the DPDK-example is doing in this discussion, > >>>>> would that not just be cpu_policy=dedicated? I guess normal > >>>>> behaviour of dedicated is that emulators and io happily share > >>>>> pCPUs with vCPUs and you are looking for a way to restrict > >>>>> emulators/io to a subset of pCPUs because you can live with some > >>>>> of them beeing not 100%. > >>>> > >>>> Yes. A typical DPDK-using VM might look something like this: > >>>> > >>>> vCPU0: non-realtime, housekeeping and I/O, handles all virtual > >>>> interrupts and "normal" linux stuff, emulator runs on same pCPU > >>>> vCPU1: realtime, runs in tight loop in userspace processing > >>>> packets vCPU2: realtime, runs in tight loop in userspace > >>>> processing packets vCPU3: realtime, runs in tight loop in > >>>> userspace processing packets > >>>> > >>>> In this context, vCPUs 1-3 don't really ever enter the kernel, > >>>> and we've offloaded as much kernel work as possible from them > >>>> onto vCPU0. This works pretty well with the current system. > >>>> > >>>>>> For RT we have to isolate the emulator threads to an additional > >>>>>> pCPU per guests or as your are suggesting to a set of pCPUs for > >>>>>> all the guests running. > >>>>>> > >>>>>> I think we should introduce a new option: > >>>>>> > >>>>>> - hw:cpu_emulator_threads_mask=^1 > >>>>>> > >>>>>> If on 'nova.conf' - that mask will be applied to the set of all > >>>>>> host CPUs (vcpu_pin_set) to basically pack the emulator threads > >>>>>> of all VMs running here (useful for RT context). > >>>>> > >>>>> That would allow modelling exactly what we need. > >>>>> In nova.conf we are talking absolute known values, no need for a > >>>>> mask and a set is much easier to read. Also using the same name > >>>>> does not sound like a good idea. > >>>>> And the name vcpu_pin_set clearly suggest what kind of load runs > >>>>> here, if using a mask it should be called pin_set. > >>>> > >>>> I agree with Henning. > >>>> > >>>> In nova.conf we should just use a set, something like > >>>> "rt_emulator_vcpu_pin_set" which would be used for running the > >>>> emulator/io threads of *only* realtime instances. > >>> > >>> I'm not agree with you, we have a set of pCPUs and we want to > >>> substract some of them for the emulator threads. We need a mask. > >>> The only set we need is to selection which pCPUs Nova can use > >>> (vcpus_pin_set). > >> > >> At that point it does not really matter whether it is a set or a > >> mask. They can both express the same where a set is easier to > >> read/configure. With the same argument you could say that > >> vcpu_pin_set should be a mask over the hosts pcpus. > >> > >> As i said before: vcpu_pin_set should be renamed because all sorts > >> of threads are put here (pcpu_pin_set?). But that would be a > >> bigger change and should be discussed as a seperate issue. > >> > >> So far we talked about a compute-node for realtime only doing > >> realtime. In that case vcpu_pin_set + emulator_io_mask would work. > >> If you want to run regular VMs on the same host, you can run a > >> second nova, like we do. > >> > >> We could also use vcpu_pin_set + rt_vcpu_pin_set(/mask). I think > >> that would allow modelling all cases in just one nova. Having all > >> in one nova, you could potentially repurpose rt cpus to > >> best-effort and back. Some day in the future ... > > > > That is not something we should allow or at least > > advertise. compute-node can't run both RT and non-RT guests and that > > because the nodes should have a kernel RT. We can't guarantee RT if > > both are on same nodes. > > A compute node with an RT OS could run RT and non-RT guests at the > same time just fine. In a small cloud (think hyperconverged with > maybe two nodes total) it's not viable to dedicate an entire node to > just RT loads. > > I'd personally rather see nova able to handle a mix of RT and non-RT > than need to run multiple nova instances on the same node and figure > out an up-front split of resources between RT nova and non-RT nova. > Better to allow nova to dynamically allocate resources as needed. I am with you, except for the "dynamically". That is something one can think of when the "static" case works. Henning > Chris __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev