On Wed, Apr 03, 2013 at 10:09:25PM +0200, Igor Mammedov wrote: > On Wed, 3 Apr 2013 16:27:11 -0300 > Eduardo Habkost <ehabk...@redhat.com> wrote: > > > On Wed, Apr 03, 2013 at 08:59:07PM +0200, Igor Mammedov wrote: > > > On Wed, 3 Apr 2013 15:10:05 -0300 > > > Eduardo Habkost <ehabk...@redhat.com> wrote: > > > > > > > On Wed, Apr 03, 2013 at 07:58:00PM +0200, Igor Mammedov wrote: > > > > <snip> > > > > > > > +void do_cpu_hot_add(const int64_t id, Error **errp) > > > > > > > +{ > > > > > > > + pc_new_cpu(saved_cpu_model, id, errp); > > > > > > > +} > > > > > > > + > > > > > > > > > > > > Missing x86_cpu_apic_id_from_index(id)? > > > > > There was(is?) opposition to using cpu_index to identify x86 CPU. > > > > > > > > Really? Do you have a pointer to the discussion? > > > Here is what I could find in my mail box: > > > http://lists.gnu.org/archive/html/qemu-devel/2012-05/msg02770.html > > > Jan could correct me if I'm wrong. > > > > > > > > So, quoting Jan: > > > From my POV, cpu_index could become equal to the physical APIC ID. > > > As long as we can set it freely (provided it remains unique) and > > > non-continuously, we don't need separate indexes." > > > > We can't choose APIC IDs freely, because the APIC ID is calculated based > > on the CPU topology (socket + core + thread IDs). > > > > So, the cpu_index could be the same as the APIC ID if the cpu_index > > value declared opaque, being just a "CPU identifier" that is chosen by > > QEMU arbitrarily (and that happens to match the APIC ID). But we will > > probably have some problems with this: > > > > - The CPU index are currently allocated contiguously, and probably > > existing interfaces already assume that (e.g. the "-numa' option, > > "info numa", "info cpus" and maybe other monitor commands) > > - QEMU must be responsible for calculating the APIC ID of each CPU, > > because it is based on the CPU topology. > > - If QEMU is the one who calculates the APIC ID, what kind of identifier > > we can use for a CPU object in the command-line (e.g. in the "-numa" > > option)? > using any kind of thread id is problematic since 2 treads from the same core > could end-up on different nodes. > Maybe placement interface could be better described as node[n]=sockets_list?
The problem here is compatibility: we need to keep existing command-lines working. And the existing interface (among other things) doesn't prevent two threads from being in different NUMA nodes. If today "-smp 9,cores=3,threads=3 -numa node,cpus=0-4 -numa nodes,cpus=5-8" has a specific meaning, we need to keep the same meaning. > > > - We may need to redefine the meaning of the "maxcpus" -smp option, if > > all our interfaces are now based in non-contiguous and freely-set CPU > > identifiers. > it's amount of CPUs available to guest, pretty clear from user's POV. I was just worrying if there could be assumptions that "maxcpus is always > cpu_index". But you are probably right. > > > > > In short, getting rid of the contiguous CPU indexes sounds very > > difficult. We could introduce other kind of identifiers, but probably we > > may need to keep the CPU indexes contiguous to keep existing interfaces > > working. > Once we have CPU unplug, we will have non-contiguous cpu_index. So it will be > part of CPU unplug series to fix cpu_index allocation/usage where necessary. Keeping compatibility after CPU unplug is not a problem as CPU unplug doesn't exist yet. The problem here is to have a realiable identifier for CPUs that can be used in the command-line. The only identifier we have for that today is a contiguous CPU index, and if we make them not contiguous we are going to make existing command-lines that use CPU indexes (e.g. using "-numa") break. > > > > > > > > > > > > > > > > > > > So, it is expected from management to provide APIC ID instead of > > > > > cpu_index. > > > > > It could be useful to make hotplug to a specific NUMA node/cpu to > > > > > work in > > > > > future. > > > > > Though interface of possible APIC IDs discovery is not part of this > > > > > series. > > > > > > > > That's exactly the opposite of what I expect. The APIC ID is an internal > > > > implementation detail, and external tools must _not_ be required to deal > > > > with it and to calculate it. > > > > > > > > Communication with the BIOS, on the other hand, is entirely based on the > > > > APIC ID, and not CPU indexes. So QEMU needs to translate the CPU indexes > > > > (used to communicate with the outside world) to APIC IDs when talking to > > > > the BIOS. > > > cpu_index won't work nicely with hot-adding CPU to specific numa node > > > though. > > > > Well, the "-numa node" options are already based on CPU indexes, so it > > would match it the existing NUMA configuration interface. > > > > > with APIC ID (mgmt might treat it as opaque) we could expose something > > > like > > > > > > /machine/icc-bridge/link<CPU[apic_id_n] > > > ... > > > > > > for all possible CPUs, with empty links for non existing ones. > > > > > > and later add on something like this: > > > > > > /machine/numa_node[x]/link<CPU[apic_id_n]> > > > ... > > > > > > Libvirt than could just pickup ready apic id from desired place and add > > > CPU > > > either using cpu-add id=xxx or device_add x86-cpu-...,apic_id=xxx > > > > > > +1 more cpu_index is QEMU implementation detail and we could not add to > > > x86 CPU > > > cpu-index property since hardware doesn't have such feature, so it won't > > > be > > > available with device_add. > > > > I don't mind hiding cpu_index too. I don't mind if we use a cpu_index, > > QOM links, arbitrary IDs set by the user. I just have a problem with > > requiring libvirt to set the APIC ID. > > > > If you give libvirt an easy way to convert a CPU "location" (index, numa > > node, whatever) to an APIC ID that is pre-calculated by QEMU, then it > > could work. But do we really need to require libvirt to deal with APIC > > ID directly? If you just set the links properly to reflect the CPU > > "location", the CPU could calculate its APIC ID based on its "location" > > using the links. > What about adding CPU to a specific node then, it would require interface for > communicating to CPU to which node it should be plugged (part of APIC ID, I > guess). Using QOM we could just use links. The question to me is how to identify the CPU "location" reliably if we're going to in a "cpu-set" interface. My point is that cpu_index works perfectly for that (as long as the rules about how each CPU index is allocated to each NUMA-node, socket, core, and thread). Later we can have something not based on CPU indexes, if we move to a 100% link-based QOM interface. Having to ask QEMU for the APIC ID somehow and requiring the APIC ID to be provided on the cpu-set command could work, yes. But it must not require libvirt to calculate and choose the APIC IDs itself. (Note that I didn't review all the code yet. Maybe you are already doing all that) -- Eduardo