On Thu, Apr 23, 2015 at 05:32:33PM +1000, David Gibson wrote: > On Tue, Apr 07, 2015 at 02:43:43PM +0200, Christian Borntraeger wrote: > > We had a call and I was asked to write a summary about our conclusion. > > > > The more I wrote, there more I became uncertain if we really came to a > > conclusion and became more certain that we want to define the QMP/HMP/CLI > > interfaces first (or quite early in the process) > > > > As discussed I will provide an initial document as a discussion starter > > > > So here is my current understanding with each piece of information on one > > line, so > > that everybody can correct me or make additions: > > > > current wrap-up of architecture support > > ------------------- > > x86 > > - Topology possible > > - can be hierarchical > > - interfaces to query topology > > - SMT: fanout in host, guest uses host threads to back guest vCPUS > > - supports cpu hotplug via cpu_add > > > > power > > - Topology possible > > - interfaces to query topology? > > For power, topology information is communicated via the > "ibm,associativity" (and related) properties in the device tree. This > is can encode heirarchical topologies, but it is *not* bound to the > socket/core/thread heirarchy. On the guest side in Power there's no > real notion of "socket", just cores with specified proximities to > various memory nodes. > > > - SMT: Power8: no threads in host and full core passed in due to HW design > > may change in the future > > > > s/390 > > - Topology possible > > - can be hierarchical > > - interfaces to query topology > > - always virtualized via PR/SM LPAR > > - host topology from LPAR can be heterogenous (e.g. 3 cpus in 1st > > socket, 4 in 2nd) > > - SMT: fanout in host, guest uses host threads to back guest vCPUS > > > > > > Current downsides of CPU definitions/hotplug > > ----------------------------------------------- > > - smp, sockets=,cores=,threads= builds only homogeneous topology > > - cpu_add does not tell were to add > > - artificial icc bus construct on x86 for several reasons (link, sysbus not > > hotpluggable..) > > Artificial though it may be, I think having a "cpus" pseudo-bus is not > such a bad idea > > > discussions > > ------------------- > > - we want to be able to (most important question, IHMO) > > - hotplug CPUs on power/x86/s390 and maybe others > > - define topology information > > - bind the guest topology to the host topology in some way > > - to host nodes > > - maybe also for gang scheduling of threads (might face reluctance from > > the linux scheduler folks) > > - not really deeply outlined in this call > > - QOM links must be allocated at boot time, but can be set later on > > - nothing that we want to expose to users > > - Machine provides QOM links that the device_add hotplug mechanism can > > use to add > > new CPUs into preallocated slots. "CPUs" can be groups of cores > > and/or threads. > > - hotplug and initial config should use same semantics > > - cpu and memory topology might be somewhat independent > > --> - define nodes > > - map CPUs to nodes > > - map memory to nodes > > > > - hotplug per > > - socket > > - core > > - thread > > ? > > Now comes the part where I am not sure if we came to a conclusion or not: > > - hotplug/definition per core (but not per thread) seems to handle all cases > > - core might have multiple threads ( and thus multiple cpustates) > > - as device statement (or object?) > > - mapping of cpus to nodes or defining the topology not really > > outlined in this call > > > > To be defined: > > - QEMU command line for initial setup > > - QEMU hmp/qmp interfaces for dynamic setup > > So, I can't say I've entirely got my head around this, but here's my > thoughts so far. > > I think the basic problem here is that the fixed socket -> core -> > thread heirarchy is something from x86 land that's become integrated > into qemu's generic code where it doesn't entirely make sense. > > Ignoring NUMA topology (I'll come back to that in a moment) qemu > should really only care about two things: > > a) the unit of execution scheduling (a vCPU or "thread") > b) the unit of plug/unplug > > Now, returning to NUMA topology. What the guest, and therefore qemu, > really needs to know is the relative proximity of each thread to each > block of memory. That usually forms some sort of node heirarchy, > but it doesn't necessarily correspond to a socket->core->thread > heirarchy you can see in physical units. > > On Power, an arbitrary NUMA node heirarchy can be described in the > device tree without reference to "cores" or "sockets", so really qemu > has no business even talking about such units. > > IIUC, on x86 the NUMA topology is bound up to the socket->core->thread > heirarchy so it needs to have a notion of those layers, but ideally > that would be specific to the pc machine type. > > So, here's what I'd propose: > > 1) I think we really need some better terminology to refer to the unit > of plug/unplug. Until someone comes up with something better, I'm > going to use "CPU Module" (CM), to distinguish from the NUMA baggage > of "socket" and also to refer more clearly to the thing that goes into > the socket, rather than the socket itself. > > 2) A Virtual CPU Module (vCM) need not correspond to a real physical > object. For machine types which we want to faithfully represent a > specific physical machine, it would. For generic or pure virtual > machines, the vCMs would be as small as possible. So for current > Power, they'd be one virtual core, for future power (maybe) or s390 a > single virtual thread. For x86 I'm not sure what they'd be. > > 3) I'm thinking we'd have a "cpus" virtual bus represented in QOM, > which would contain the vCMs (also QOM objects). Their existence > would be generic, though we'd almost certainly use arch and/or machine > specific subtypes. > > 4) There would be a (generic) way of finding the vCPUS (threads) in a > vCM and the vCM for a specific vCPU. > > 5) A vCM *might* have internal subdivisions into "cores" or "nodes" or > "chips" or "MCMs" or whatever, but that would be up to the machine > type specific code, and not represented in the QOM heirarchy. > > 6) Obviously we'd need some backwards compat goo to sort out existing > command line options referring to cores and sockets into the new > representation. This will need machine type specific hooks - so for > x86 it would need to set up the right vCM subdivisions and make sure > the right NUMA topology info goes into ACPI. For -machine pseries I'm > thinking that "-smp sockets=2,cores=1,threads=4" and "-smp > sockets=1,cores=2,threads=4" should result in exactly the same thing > internally. > > > Thoughts?
I should add - the terminology's a bit different, but I think in terms of code this should be very similar to the "socket" approach previously proposed. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
pgpLHcDCC6YxK.pgp
Description: PGP signature