% Heterogeneous Multi Processing Support in Xen % Revision 1 \clearpage
# Basics ---------------- ------------------------ Status: **Design Document** Architecture(s): x86, arm Component(s): Hypervisor and toolstack ---------------- ------------------------ # Overview HMP (Heterogeneous Multi Processing) and AMP (Asymmetric Multi Processing) refer to systems where physical CPUs are not exactly equal. It may be that they have different processing power, or capabilities, or that each is specifically designed to run a particular system component. Most of the times the CPUs have different Instruction Set Architectures (ISA) or Application Binary Interfaces (ABIs). But they may *just* be different implementations of the same ISA, in which case they typically differ in speed, power efficiency or handling of special things (e.g., erratas). An example is ARM big.LITTLE, which in fact, is the use case that got the discussion about HMP started. This document, however, is generic, and does not target only big.LITTLE. What need proper Xen support are systems and use cases where virtual CPUs can not be seamlessly moved around all the physical CPUs. In fact, in these cases, there must be a way to: * decide and specify on what (set of) physical CPU(s), each vCPU can execute on; * enforce that a vCPU that can only run on a certain (set of) pCPUs, is never actually run anywhere else. **N.B.:** it is becoming common to refer as AMP or HMP also to systems which have various kind of co-processors (from crypto engines to graphic hardware), integrated with the CPUs on the same chip. This is not what this design document is about. # Classes of CPUs A *class of CPUs* is defined as follows: 1. each pCPU in the system belongs to a class; 2. a class can consist of one or more pCPUs; 3. each pCPU can only be in one class; 4. CPUs belonging to the same class are homogeneous enough that a virtual CPU that blocks/is preempted while running on a pCPU of a class can, **seamlessly**, unblock/be scheduler on any pCPU of that same class; 5. when a virtual CPU is associated with a (set of) class(es) of CPUs, it means that the vCPU can run on all the pCPUs belonging to the said class(es). So, for instance, in architecture Foobar two classes of CPUs exist, class foo and class bar. If a virtual CPU running on a CPU 0, which is of class foo, blocks (or is preempted), it can, when it unblocks (or is selected by the scheduler to run again), run on CPU 3, still of class foo, but not on CPU 6, which is of class bar. ## Defining classes How a class is defined, i.e., what are the specific characteristics that determine what CPUs belong to which class, is highly architecture specific. ### x86 There is no HMP platform of relevance, for now, in x86 world. Therefore, only one class will exist, and all the CPUs will be set to belong to it. **TODO X86:** is this correct? ### ARM **TODO ARM:** I know nothing about what specifically should be used to form classes, so I'm deferring this to ARM people. So far, in the original thread the following ideas came up (well, there's more, but I don't know enough of ARM to judge what is really relevant about this topic): * [Julien](https://lists.xenproject.org/archives/html/xen-devel/2016-09/msg02153.html) "I don't think an hardcoded list of processor in Xen is the right solution. There are many existing processors and combinations for big.LITTLE so it will nearly be impossible to keep updated." * [Julien](https://lists.xenproject.org/archives/html/xen-devel/2016-09/msg02256.html) "Well, before trying to do something clever like that (i.e naming "big" and "little"), we need to have upstreamed bindings available to acknowledge the difference. AFAICT, it is not yet upstreamed for Device Tree and I don't know any static ACPI tables providing the similar information." * [Peng](https://lists.xenproject.org/archives/html/xen-devel/2016-09/msg02194.html) "For how to differentiate cpus, I am looking the linaro eas cpu topology code" # User details ## Classes of CPUs for the users It will be possible, in a VM config file, to specify the (set of) class(es) of each vCPU. This allows creating HMP VMs. E.g., on ARM, it will be possible to create big.LITTLE VMs which, if run on big.LITTLE hosts, could leverage the big.LITTLE support of the guest OS kernel and tools. For such purpose, a new option will be added to xl config file: vcpus = "8" vcpuclass = ["0-2:class0", "3,4:class1,class3", "5:class0, class2", "8:class4"] with the following meaning: * vCPUs 0, 1, 2 can only run on pcpus of class class0 * vCPUs 3, 4 can run on pcpus of class class1 **and** on pcpus of class class3 * vCPUs 5 can run on pcpus of class class0 **and** on pCPUs of class class2 * for vCPUs 7, since they're not mentioned, default applies * vCPUs 8 can only run on pcpus of class class4 For the vCPUs for which no class is specified, default behavior applies. **TODO:** note that I think it must be possible to associate more than one class to a vCPU. This is expressed in the example above, and assumed to be true throughout the document. It might be, though, that, at least at early stages (see implementation phases below), we will enable only 1-to-1 mapping. **TODO:** default can be, either: 1. the vCPU can run on any CPU of any class, 2. the vCPU can only run on a specific, arbitrary decided, class (and I'd say that should be class 0). The former seems a better interface. It looks to me like the most natural and less surprising, from the user point of view, and the most future proof (see phase 3 of implementation below). The latter may be more practical, though. In fact, with the former, we risk crashing (the guest or the hypervisor) if one creates a VM and forgets to specify the vCPU classes --which does not look ideal. It will be possible to gather information about what classes exist, and what pCPUs belong to each class, by issuing the `xl info -n' command: cpu_topology : cpu: core socket node class 0: 0 1 0 0 1: 0 1 0 1 2: 1 1 0 2 3: 1 1 0 3 4: 9 1 0 3 5: 9 1 0 0 6: 10 1 0 1 7: 10 1 0 2 8: 0 0 1 3 9: 0 0 1 3 10: 1 0 1 1 11: 1 0 1 0 12: 9 0 1 1 13: 9 0 1 0 14: 10 0 1 2 15: 10 0 1 2 **TODO:** do we want to keep using `-n`, or add another switch, like -c or something? I'm not sure I like using `-n` as, e.g., on x86, this would most of the times result in just a column full of `0`, and it may raise confusion among users about what that actually means. Also, do we want to print the class ids, or some more abstract class names? (or support both, and have a way to decide which one to see)? # Technical details ## Hypervisor The hypervisor needs to know within which class each of the present CPUs falls. At boot (or, in general, CPU bringup) time, while identifying the CPU, a list of classes is constructed, and the mapping between each CPU and the class it is determined it should belong, established. The list of classes is kept ordered from the more powerful to the less powerful. **TODO:** this has been [proposed by George](https://lists.xenproject.org/archives/html/xen-devel/2016-09/msg02212.html). I like the idea, what do others think? If we agree on that, note that there has been no discussion on defining what "more powerful" means, neither on x86 (although, not really that interesting, for now, I'd say), nor on ARM. The mapping between CPUs and classes will be kept in memory in the following data structures: uint16_t cpu_to_class[NR_CPUS] __read_mostly; cpumask_t class_to_cpumask[NR_CPUS] __read_mostly; **TODO:** it's probably better to allocate the cpumask array dynamically, to avoid wasting too much space. **TODO:** if we want the ordering, structure needs to be kept ordered too (or additional structures should be used for the purpose). Each virtual CPU must know on what class of CPUs it can run on. Since a vCPU can be associated to more than one class, the best way to keep track of this information is a bitamp. That will be a new `cpumask` typed member in `struct vcpu`. were the i-eth bit set means the vCPU can run on CPUs of class i. If a vCPU is found running on a pCPU of a class that is not associated to the vCPU itself, an exception should be raised. **TODO:** What kind? BUG_ON? Crash the guest? The guest would probably crash --or become unreliable-- by its own, I guess. Setting and getting the CPU class of a vCPU will happen via two new hypercalls: * `XEN_DOMCTL_setvcpuclass` * `XEN_DOMCTL_setvcpuclass` Information about CPU classes will be propagated to toolstak by adding a new field in xen_sysctl_cputopo, which will become: struct xen_sysctl_cputopo { uint32_t core; uint32_t socket; uint32_t node; unit32_t class; }; For homogeneous and SMP systems, the value of the new class field will be 0 for all the cores. ## Toolstack It will be possible for the toolstack to retrieve from Xen the list of existing CPU classes, their names, and the information about to which class each present CPU belongs to. **TODO:** [George suggested](https://lists.xenproject.org/archives/html/xen-devel/2016-09/msg02212.html) to allow a richer set of labels, at the toolstack level, and I like the idea very much. It's not clear to me, though, in what component this list of names, and the mapping between them and the classes as they're known inside Xen should live. Libxl and libxc interfaces will be introduced for associating a vCPU to a (set of) class(es): * `libxl_set_vcpuclass()`, `libxl_get_vcpuclass()`; * `xc_vcpu_setclass()`, `xc_vcpu_getclass()`. In libxl, class information will be added in `struct libxl_cputopology`, which is filled by `libxl_get_cpu_topology()`. # Implementation Implementation can proceed in phases. ## Phase 1 Class definition, identification and mapping of CPUs to classes, inside Xen, will be implemented. And so they will be libxc and libxl interfaces for retrieving such information. Parsing of the new `vcpuclass` parameter will be implemented in `xl`. The result of such parsing will then be used as if it were the hard-affinity of the various vCPUs. That is, we will set the hard-affinity of each vCPU, to the pCPUs that are part of the class(es) the vCPU itself is being assigned, according to `vcpuclass`. This would *Just Work(TM)*, as soon as the user does not try to change the hard-affinity, during the VM lifetime (e.g., with `xl vcpu-pin'). **TODO:** It may be useful, for avoiding the above to happen, to add another `xl` config option that, if set, disallows changing the affinity from what it was at VM creation time (something like `immutable_affinity=1`). Thoughts? I'm leaning toward doing that, as it may even be something useful to have in other usecases. ### Phase 1.5 Library (libxc and libxl) calls and hypercalls that are necessary to associate a class to the vCPUs will be implemented. At which point, when parsing `vcpuclass` in `xl`, we will call both (with the same bitmap as input): * `libxl_set_vcpuclass()` * `libxl_set_vcpuaffinity()` `libxl__set_vcpuaffinity()` will be modified in such a way that, when setting hard-affinity for a vCPU: * it will get the CPU class(es) associated to the vCPU; * it will check what pCPUs that belong to the class(es); * it will filter out, from the new hard-affinity being set, the pCPUs that are not in the vCPU's class(es)'. As a safety measure, `vcpu_set_hard_affinity()` in Xen will also be modified such that, if someone somehow manages to pass down an hard-affinity mask which contains pCPUs outside from the proper classes, it will error out with -EINVAL. ### Phase 2 Inside Xen, the various schedulers will be modified to deal internally with the fact that vCPUs can only run on pCPUs from the class(es) they are associated with. This allows for more efficient implementation, and paves the way for enabling more intelligent logic (e.g., for minimizing power consumption) in *phase 3*. Calling `libxl_set_vcpuaffinity()` from `xl` / libxl is therefore no longer necessary and will be avoided (i.e., only `libxl_set_vcpuclass()` will be called). ### Phase 3 Moving vCPUs between classes will be implemented. This means that, e.g., on ARM big.LITTLE, it will be possible for a vCPU to block on a big core and wakeup on a LITTLE core. **TODO:** About what this takes, see [Julien's email](https://lists.xenproject.org/archives/html/xen-devel/2016-09/msg02345.html). This means it will be no longer necessary to specify the class of the vCPUs via `vcpuclass` in `xl`, although that will of course remain supported. So: 1. if one wants (sticking with big.LITTLE as example) a big.LITTLE VM, and wants to make sure that make sure that big vCPUs will run on big pCPUs, and that LITTLE vCPUs will run on LITTLE pCPUs, she will use: vcpus = "8" vcpuclass = ["0-3:big", "4-7:little"] 2. if one does not care, and is happy to let the Xen scheduler decide where to run the various vCPUs, in order, for instance, to be sure to get the best power efficiency for the host as a whole, he can just avoid specifying any `vcpuclass`, or doing something like this: vcpuclass = ["all:all"] # Limitations * Until in *phase 1*, it won't be possible to use vCPU hard-affinity for anything else than HMP support; * until before *phase 3*, since HMP support is basically the same as setting hard-affinity, performance may not be ideal; * until before *phase 3*, vCPUs can't move between classes. This means. for instance, in the big.LITTLE world, Xen's scheduler can't move a vCPU running on a big core on a LITTLE core (e.g., to try save power). # Testing Testing requires an actual AMP/HMP system. On such a system, we at least want to: * create a VM **without** specifying `vcpuclass` in its config file, and check that the default policy is correctly applied to all vCPUs; * create a VM **specifying** `vcpuclass` in its config file and check that the classes are assegned to vCPUs appropriately; * create a VM **specifying** `vcpuclass` in its config file and check that the various vCPUs are not running on any pCPU outside of their respective classes. # Areas for improvement * Make it possible to test even on non-HMP systems. That could be done by making it possible to provide Xen with fake CPU classes for the system CPUs (e.g., with boot time parameters); * implement a way to view the class the vCPUs have been assigned (either as past of the output of `xl vcpu-list`, or as a dedicated `xl` subcommand); * make it possible to dynamically change the class of vCPUs at runtime, with `xl` (either via a new parameter to `vcpu-pin` subcommand, or via a new subcommand). # Known issues *TBD*. # References * [Asymetric Multi Processing](https://en.wikipedia.org/wiki/Asymmetric_multiprocessing) * [Heterogeneous Multi Processing](https://en.wikipedia.org/wiki/Heterogeneous_computing) * [ARM big.LITTLE](https://www.arm.com/products/processors/technologies/biglittleprocessing.php) # History ------------------------------------------------------------------------ Date Revision Version Notes ---------- -------- -------- ------------------------------------------- 2016-12-02 1 RFC of design document ---------- -------- -------- ------------------------------------------- -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
signature.asc
Description: This is a digitally signed message part
_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel