Am Tue, 27 Jun 2017 09:28:34 -0600 schrieb Chris Friesen <chris.frie...@windriver.com>:
> On 06/27/2017 01:45 AM, Sahid Orentino Ferdjaoui wrote: > > On Mon, Jun 26, 2017 at 12:12:49PM -0600, Chris Friesen wrote: > >> On 06/25/2017 02:09 AM, Sahid Orentino Ferdjaoui wrote: > >>> On Fri, Jun 23, 2017 at 10:34:26AM -0600, Chris Friesen wrote: > >>>> On 06/23/2017 09:35 AM, Henning Schild wrote: > >>>>> Am Fri, 23 Jun 2017 11:11:10 +0200 > >>>>> schrieb Sahid Orentino Ferdjaoui <sferd...@redhat.com>: > >>>> > >>>>>> In Linux RT context, and as you mentioned, the non-RT vCPU can > >>>>>> acquire some guest kernel lock, then be pre-empted by emulator > >>>>>> thread while holding this lock. This situation blocks RT vCPUs > >>>>>> from doing its work. So that is why we have implemented [2]. > >>>>>> For DPDK I don't think we have such problems because it's > >>>>>> running in userland. > >>>>>> > >>>>>> So for DPDK context I think we could have a mask like we have > >>>>>> for RT and basically considering vCPU0 to handle best effort > >>>>>> works (emulator threads, SSH...). I think it's the current > >>>>>> pattern used by DPDK users. > >>>>> > >>>>> DPDK is just a library and one can imagine an application that > >>>>> has cross-core communication/synchronisation needs where the > >>>>> emulator slowing down vpu0 will also slow down vcpu1. You DPDK > >>>>> application would have to know which of its cores did not get a > >>>>> full pcpu. > >>>>> > >>>>> I am not sure what the DPDK-example is doing in this > >>>>> discussion, would that not just be cpu_policy=dedicated? I > >>>>> guess normal behaviour of dedicated is that emulators and io > >>>>> happily share pCPUs with vCPUs and you are looking for a way to > >>>>> restrict emulators/io to a subset of pCPUs because you can live > >>>>> with some of them beeing not 100%. > >>>> > >>>> Yes. A typical DPDK-using VM might look something like this: > >>>> > >>>> vCPU0: non-realtime, housekeeping and I/O, handles all virtual > >>>> interrupts and "normal" linux stuff, emulator runs on same pCPU > >>>> vCPU1: realtime, runs in tight loop in userspace processing > >>>> packets vCPU2: realtime, runs in tight loop in userspace > >>>> processing packets vCPU3: realtime, runs in tight loop in > >>>> userspace processing packets > >>>> > >>>> In this context, vCPUs 1-3 don't really ever enter the kernel, > >>>> and we've offloaded as much kernel work as possible from them > >>>> onto vCPU0. This works pretty well with the current system. > >>>> > >>>>>> For RT we have to isolate the emulator threads to an > >>>>>> additional pCPU per guests or as your are suggesting to a set > >>>>>> of pCPUs for all the guests running. > >>>>>> > >>>>>> I think we should introduce a new option: > >>>>>> > >>>>>> - hw:cpu_emulator_threads_mask=^1 > >>>>>> > >>>>>> If on 'nova.conf' - that mask will be applied to the set of > >>>>>> all host CPUs (vcpu_pin_set) to basically pack the emulator > >>>>>> threads of all VMs running here (useful for RT context). > >>>>> > >>>>> That would allow modelling exactly what we need. > >>>>> In nova.conf we are talking absolute known values, no need for > >>>>> a mask and a set is much easier to read. Also using the same > >>>>> name does not sound like a good idea. > >>>>> And the name vcpu_pin_set clearly suggest what kind of load > >>>>> runs here, if using a mask it should be called pin_set. > >>>> > >>>> I agree with Henning. > >>>> > >>>> In nova.conf we should just use a set, something like > >>>> "rt_emulator_vcpu_pin_set" which would be used for running the > >>>> emulator/io threads of *only* realtime instances. > >>> > >>> I'm not agree with you, we have a set of pCPUs and we want to > >>> substract some of them for the emulator threads. We need a mask. > >>> The only set we need is to selection which pCPUs Nova can use > >>> (vcpus_pin_set). > >>> > >>>> We may also want to have "rt_emulator_overcommit_ratio" to > >>>> control how many threads/instances we allow per pCPU. > >>> > >>> Not really sure to have understand this point? If it is to > >>> indicate that for a pCPU isolated we want X guest emulator > >>> threads, the same behavior is achieved by the mask. A host for > >>> realtime is dedicated for realtime, no overcommitment and the > >>> operators know the number of host CPUs, they can easily deduct a > >>> ratio and so the corresponding mask. > >> > >> Suppose I have a host with 64 CPUs. I reserve three for host > >> overhead and networking, leaving 61 for instances. If I have > >> instances with one non-RT vCPU and one RT vCPU then I can run 30 > >> instances. If instead my instances have one non-RT and 5 RT vCPUs > >> then I can run 12 instances. If I put all of my emulator threads > >> on the same pCPU, it might make a difference whether I put 30 sets > >> of emulator threads or 12 sets. > > > > Oh I understand your point now, but not sure that is going to make > > any difference. I would say the load in the isolated cores is > > probably going to be the same. Even that an overhead will be the > > number of threads handled which will be slightly higher in your > > first scenario. > >> The proposed "rt_emulator_overcommit_ratio" would simply say "nova > >> is allowed to run X instances worth of emulator threads on each > >> pCPU in "rt_emulator_vcpu_pin_set". If we've hit that threshold, > >> then no more RT instances are allowed to schedule on this compute > >> node (but non-RT instances would still be allowed). > > > > Also I don't think we want to schedule where the emulator threads of > > the guests should be pinned on the isolated cores. We will let them > > float on the set of cores isolated. If there is a requierement to > > have them pinned so probably the current implementation will be > > enough. > > Once you use "isolcpus" on the host, the host scheduler won't "float" > threads between the CPUs based on load. To get the float behaviour > you'd have to not isolate the pCPUs that will be used for emulator > threads, but then you run the risk of the host running other work on > those pCPUs (unless you use cpusets or something to isolate the host > work to a subset of non-isolcpus pCPUs). With openstack you use libvirt and libvirt uses cgroups/cpusets to get those threads onto these cores. Henning > Chris > > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: > openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev