Eduardo, Igor, On 10/08/19 12:52, Laszlo Ersek wrote: > FW_CFG_MAX_CPUS exposes the (exclusive) maximum APIC ID to guest firmware, > due to historical reasons. That value is not useful to edk2, however. For > supporting VCPU hotplug, edk2 needs: > > - the boot CPU count (already exposed in FW_CFG_NB_CPUS), > > - and the maximum foreseen CPU count (tracked in > "MachineState.smp.max_cpus", but not currently exposed). > > Add a new fw-cfg file to expose "max_cpus". > > While at it, expose the rest of the topology too (die / core / thread > counts), because I expect that the VCPU hotplug feature for OVMF will > ultimately need those too, and the data size is not large.
In fact, it seems like OVMF will have to synthesize the new (hot-plugged) VCPU's *APIC-ID* from the following information sources: - the topology information described above (die / core / thread counts), and - the "modern" CPU hotplug interface (docs/specs/acpi_cpu_hotplug.txt). Now, if I understand correctly, the "CPU selector" ([0x0-0x3]) specifies a CPU *index*. Therefore, in the hotplug SMI handler (running on one of the pre-existent CPUs), OVMF will have to translate the new CPU's selector (the new CPU's *index*) to its *APIC-ID*, based on the topology information (numbers of dies / cores / threads). (That's because existent SMM infrastructure in edk2 uses the initial APIC-ID as the key for referencing CPUs.) Algorithmically, I think this translation is doable in OVMF -- after all, it is implemented in the x86_apicid_from_cpu_idx() function already, in "include/hw/i386/topology.h". And that function does not need more information either: static inline apic_id_t x86_apicid_from_cpu_idx(unsigned nr_dies, unsigned nr_cores, unsigned nr_threads, unsigned cpu_index) Therefore, my plan is to implement the same translation logic in OVMF. Now, the questions: - Am I right to think that the "CPU selector" register in the "modern" ACPI hotplug interface operates in the *same domain* as the "cpu_index" parameter of x86_apicid_from_cpu_idx()? - As we progress through CPU indices, x86_apicid_from_cpu_idx() first fills threads in a core, then cores in a die, then dies in a socket. Will this logic remain the same, forever? If any of the two questions above is answered with "no", then OVMF will need an fw_cfg blob that is different from the current proposal. Namely, OVMF will need a *full* "cpu_index -> APIC-ID" map, up to (and excluding) "max_cpus". The pc_possible_cpu_arch_ids() function in "hw/i386/pc.c" already calculates a similar map: ms->possible_cpus->cpus[i].arch_id = x86_cpu_apic_id_from_index(pcms, i); So, basically that map is what OVMF would have to receive over fw_cfg, *if* the "cpu_index -> APIC-ID" mapping is not considered guest ABI. Should I write v2 for that? Please comment! Thanks, Laszlo > This is > slightly complicated by the fact that the die count is specific to > PCMachineState, but fw_cfg_arch_create() intends to be PC-independent (see > commit 149c50cabcc4). > > For now, the feature is temporarily disabled. > > Cc: "Michael S. Tsirkin" <m...@redhat.com> > Cc: Eduardo Habkost <ehabk...@redhat.com> > Cc: Igor Mammedov <imamm...@redhat.com> > Cc: Marcel Apfelbaum <marcel.apfelb...@gmail.com> > Cc: Paolo Bonzini <pbonz...@redhat.com> > Cc: Philippe Mathieu-Daudé <phi...@redhat.com> > Cc: Richard Henderson <r...@twiddle.net> > Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=1515 > Signed-off-by: Laszlo Ersek <ler...@redhat.com> > --- > hw/i386/fw_cfg.h | 30 +++++++++++++++++++++++++++++- > hw/i386/fw_cfg.c | 26 ++++++++++++++++++++++++-- > hw/i386/pc.c | 4 ++-- > 3 files changed, 55 insertions(+), 5 deletions(-) > > diff --git a/hw/i386/fw_cfg.h b/hw/i386/fw_cfg.h > index e0856a376996..d742435b9793 100644 > --- a/hw/i386/fw_cfg.h > +++ b/hw/i386/fw_cfg.h > @@ -18,9 +18,37 @@ > #define FW_CFG_E820_TABLE (FW_CFG_ARCH_LOCAL + 3) > #define FW_CFG_HPET (FW_CFG_ARCH_LOCAL + 4) > > +/** > + * FWCfgX86Topology: expose the X86 CPU topology to guest firmware over > fw-cfg. > + * > + * All fields have little-endian encoding. > + * > + * @dies: Number of dies per package (aka socket). Set it to 1 unless the > + * concrete MachineState subclass defines it differently. > + * @cores: Corresponds to @CpuTopology.@cores. > + * @threads: Corresponds to @CpuTopology.@threads. > + * @max_cpus: Corresponds to @CpuTopology.@max_cpus. > + * > + * Firmware can derive the package (aka socket) count with the following > + * formula: > + * > + * DIV_ROUND_UP(@max_cpus, @dies * @cores * @threads) > + * > + * Firmware can derive APIC ID field widths and offsets per the standard > + * calculations in "include/hw/i386/topology.h". > + */ > +typedef struct FWCfgX86Topology { > + uint32_t dies; > + uint32_t cores; > + uint32_t threads; > + uint32_t max_cpus; > +} QEMU_PACKED FWCfgX86Topology; > + > FWCfgState *fw_cfg_arch_create(MachineState *ms, > uint16_t boot_cpus, > - uint16_t apic_id_limit); > + uint16_t apic_id_limit, > + unsigned smp_dies, > + bool expose_topology); > void fw_cfg_build_smbios(MachineState *ms, FWCfgState *fw_cfg); > void fw_cfg_build_feature_control(MachineState *ms, FWCfgState *fw_cfg); > > diff --git a/hw/i386/fw_cfg.c b/hw/i386/fw_cfg.c > index 39b6bc60520c..33d09875014f 100644 > --- a/hw/i386/fw_cfg.c > +++ b/hw/i386/fw_cfg.c > @@ -85,9 +85,26 @@ void fw_cfg_build_smbios(MachineState *ms, FWCfgState > *fw_cfg) > } > } > > +static void fw_cfg_expose_topology(FWCfgState *fw_cfg, > + unsigned dies, > + unsigned cores, > + unsigned threads, > + unsigned max_cpus) > +{ > + FWCfgX86Topology *topo = g_new(FWCfgX86Topology, 1); > + > + topo->dies = cpu_to_le32(dies); > + topo->cores = cpu_to_le32(cores); > + topo->threads = cpu_to_le32(threads); > + topo->max_cpus = cpu_to_le32(max_cpus); > + fw_cfg_add_file(fw_cfg, "etc/x86-smp-topology", topo, sizeof *topo); > +} > + > FWCfgState *fw_cfg_arch_create(MachineState *ms, > - uint16_t boot_cpus, > - uint16_t apic_id_limit) > + uint16_t boot_cpus, > + uint16_t apic_id_limit, > + unsigned smp_dies, > + bool expose_topology) > { > FWCfgState *fw_cfg; > uint64_t *numa_fw_cfg; > @@ -143,6 +160,11 @@ FWCfgState *fw_cfg_arch_create(MachineState *ms, > (1 + apic_id_limit + nb_numa_nodes) * > sizeof(*numa_fw_cfg)); > > + if (expose_topology) { > + fw_cfg_expose_topology(fw_cfg, smp_dies, ms->smp.cores, > + ms->smp.threads, ms->smp.max_cpus); > + } > + > return fw_cfg; > } > > diff --git a/hw/i386/pc.c b/hw/i386/pc.c > index bcda50efcc23..bb72b12edad2 100644 > --- a/hw/i386/pc.c > +++ b/hw/i386/pc.c > @@ -1738,8 +1738,8 @@ void pc_memory_init(PCMachineState *pcms, > option_rom_mr, > 1); > > - fw_cfg = fw_cfg_arch_create(machine, > - pcms->boot_cpus, pcms->apic_id_limit); > + fw_cfg = fw_cfg_arch_create(machine, pcms->boot_cpus, > pcms->apic_id_limit, > + pcms->smp_dies, false); > > rom_set_fw(fw_cfg); > >