On Mon, 14 Oct 2024 20:22:02 +0100 Salil Mehta <salil.me...@huawei.com> wrote:
> Certain CPU architecture specifications [1][2][3] prohibit changes to CPU ^^^^^^^^^ these do not point to specs but never mind > presence after the kernel has booted. This limitation exists because many > system > initializations rely on the exact CPU count at boot time and do not expect it > to > change later. For example, components like interrupt controllers, which are > closely tied to CPUs, or various per-CPU features, may not support > configuration > changes once the kernel has been initialized. This presents a challenge for > virtualization features such as vCPU hotplug. well, x86 (incl cpu,interrupt ctrls) also doesn't have architectural hotplug. It's just OEM managed to implement it regardless and then bothered to make OS changes to work with that. It's just ARM community doesn't want to go there at this point of time but using above reasoning as justification for this series doesn't look good to me. So what ARM would like to support is not CPU hotplug but rather a fixed system with standby CPUs (that can be powered on/off on demand). With ACPI spec amended to support that (MADT present/enabled changes), it's good enough reason to add 'enabled' handling to acpi cpu-hotplug code instead of inventing alternative one that would do almost the same. But lets get rid of (can't/may not) language above and use standby CPUs reasoning to avoid any confusion. PS: I'm taking about hw hotplug (at QEMU level) and not kernel hotplug (where it's at logical cpu level). > To address this issue, introduce an `is_enabled` state in the `AcpiCpuStatus`, > which reflects whether a vCPU has been hot-plugged or hot-unplugged in QEMU, > marking it as (un)available in the Guest Kernel. good so far > The `is_present` state should > be set based on the `acpi_persistent` flag. In cases where unplugged vCPUs > need > to be deliberately simulated in the ACPI to maintain a persistent view of > vCPUs, > this flag ensures the guest kernel continues to see those vCPUs. that's where I have to disagree, vCPU is present when a corresponding QOM object exists. Faking it's presence will only confuse at the end. I get that you want to reuse device_add/del interface, but that leads to pulling the leg here and there to make thing fit. That in short therm (while someone remembers all tricks) might work for some, but long therm it's not sustainable). Maybe instead of pushing device_add/del, we should rather implement standby CPU model here, as ARM folks expect it to be. i.e. instead of device_add/del add 'enabled' property to ARM vCPU, and let management to turn on/off vCPUs on demand. (and very likely this model could be reused by other sock based platforms) For end-user it would be the same as device_add/del (i.e. vCPU becomes usable/unsable) I'd bet it would simplify your ARM CPU hotplug series a lot, since you won't have to fake vCPU object lifetime and do non trivial tricks to make it all work. Which it turn will make ARM hotplug series much more approachable for reviewers /and whomever comes later across that code/. Regardless of said, we would still need changes to ACPI cphp code, see my comments inline. > Additionally, introduce an `acpi_persistent` property that can be used to > initialize the ACPI vCPU presence state accordingly. Architectures requiring > ACPI to expose a persistent view of vCPUs can override its default value. > Refer > to the patch-set implelenting vCPU hotplug support for ARM for more details on > its usage. > > References: > [1] KVMForum 2023 Presentation: Challenges Revisited in Supporting Virt CPU > Hotplug on > architectures that don’t Support CPU Hotplug (like ARM64) > a. Kernel Link: > https://kvm-forum.qemu.org/2023/KVM-forum-cpu-hotplug_7OJ1YyJ.pdf > b. Qemu Link: > https://kvm-forum.qemu.org/2023/Challenges_Revisited_in_Supporting_Virt_CPU_Hotplug_-__ii0iNb3.pdf > [2] KVMForum 2020 Presentation: Challenges in Supporting Virtual CPU Hotplug > on > SoC Based Systems (like ARM64) > Link: https://kvmforum2020.sched.com/event/eE4m > [3] Check comment 5 in the bugzilla entry > Link: https://bugzilla.tianocore.org/show_bug.cgi?id=4481#c5 > > Signed-off-by: Salil Mehta <salil.me...@huawei.com> > --- > cpu-target.c | 1 + > hw/acpi/cpu.c | 35 ++++++++++++++++++++++++++++++++++- > include/hw/acpi/cpu.h | 21 +++++++++++++++++++++ > include/hw/core/cpu.h | 21 +++++++++++++++++++++ > 4 files changed, 77 insertions(+), 1 deletion(-) > > diff --git a/cpu-target.c b/cpu-target.c > index 499facf774..c8a29ab495 100644 > --- a/cpu-target.c > +++ b/cpu-target.c > @@ -200,6 +200,7 @@ static Property cpu_common_props[] = { > */ > DEFINE_PROP_LINK("memory", CPUState, memory, TYPE_MEMORY_REGION, > MemoryRegion *), > + DEFINE_PROP_BOOL("acpi-persistent", CPUState, acpi_persistent, false), I agree with Gavin, it's not CPU property/business, but a platform one. Pass it as argument to cpu_hotplug_hw_init(), and maybe rename to always_present. Then make sure that it's configurable in GED (which calls the function), and just turn it on for arm/virt machine. Other platforms might want to use x86 approach with GED and have vCPU actually disappearing. /loongson and maybe risc-v/ > #endif > DEFINE_PROP_END_OF_LIST(), > }; > diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c > index 5cb60ca8bc..083c4010c2 100644 > --- a/hw/acpi/cpu.c > +++ b/hw/acpi/cpu.c > @@ -225,7 +225,40 @@ void cpu_hotplug_hw_init(MemoryRegion *as, Object *owner, > state->dev_count = id_list->len; > state->devs = g_new0(typeof(*state->devs), state->dev_count); > for (i = 0; i < id_list->len; i++) { > - state->devs[i].cpu = CPU(id_list->cpus[i].cpu); > + struct CPUState *cpu = CPU(id_list->cpus[i].cpu); > + /* > + * In most architectures, CPUs that are marked as ACPI 'present' are > + * also ACPI 'enabled' by default. These states remain consistent at > + * both the QOM and ACPI levels. > + */ > + if (cpu) { > + state->devs[i].is_enabled = true; > + state->devs[i].is_present = true; > + state->devs[i].cpu = cpu; > + } else { > + state->devs[i].is_enabled = false; > + /* > + * In some architectures, even 'unplugged' or 'disabled' QOM CPUs > + * may be exposed as ACPI 'present.' This approach provides a > + * persistent view of the vCPUs to the guest kernel. This could > be > + * due to an architectural constraint that requires every per-CPU > + * component to be present at boot time, meaning the exact count > of > + * vCPUs must be known and cannot be altered after the kernel has > + * booted. As a result, the vCPU states at the QOM and ACPI > levels > + * might become inconsistent. However, in such cases, the > presence > + * of vCPUs has been deliberately simulated at the ACPI level. > + */ if cpus are not 'simulated', you will not need comments explaining that all over place and whole hunk would likely go away. > + if (acpi_persistent_cpu(first_cpu)) { > + state->devs[i].is_present = true; > + /* > + * `CPUHotplugState::AcpiCpuStatus::cpu` becomes > insignificant > + * in this case > + */ > + } else { > + state->devs[i].is_present = false; > + state->devs[i].cpu = cpu; > + } > + } > state->devs[i].arch_id = id_list->cpus[i].arch_id; > } > memory_region_init_io(&state->ctrl_reg, owner, &cpu_hotplug_ops, state, > diff --git a/include/hw/acpi/cpu.h b/include/hw/acpi/cpu.h > index 32654dc274..bd3f9973c9 100644 > --- a/include/hw/acpi/cpu.h > +++ b/include/hw/acpi/cpu.h > @@ -26,6 +26,8 @@ typedef struct AcpiCpuStatus { > uint64_t arch_id; > bool is_inserting; > bool is_removing; > + bool is_present; with always_present, it might be better to move field to CPUHotplugState as it's not per vCPU anymore, and in standby case state->devs[i].cpu should work as implicit present flag. (see below wrt doc first comment) > + bool is_enabled; I'd move introduction of this field into a separate patch. BTW: new ABI/fields accessible by guest should be described in docs/specs/acpi_cpu_hotplug.rst. It would be better to have the spec as patch 1st, that we all agree on and then follow with implementation. And also include there an expected workflow for standby case. > bool fw_remove; > uint32_t ost_event; > uint32_t ost_status; > @@ -75,4 +77,23 @@ extern const VMStateDescription vmstate_cpu_hotplug; > VMSTATE_STRUCT(cpuhp, state, 1, \ > vmstate_cpu_hotplug, CPUHotplugState) > > +/** > + * acpi_persistent_cpu: > + * @cpu: The vCPU to check > + * > + * Checks if the vCPU state should always be reflected as *present* via ACPI > + * to the Guest. By default, this is False on all architectures and has to be > + * explicity set during initialization. > + * > + * Returns: True if it is ACPI 'persistent' CPU > + * > + */ > +static inline bool acpi_persistent_cpu(CPUState *cpu) > +{ > + /* > + * returns if 'Presence' of the vCPU is persistent and should be > simulated > + * via ACPI even after vCPUs have been unplugged in QOM > + */ > + return cpu && cpu->acpi_persistent; > +} > #endif > diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h > index 04e9ad4996..299e96c45b 100644 > --- a/include/hw/core/cpu.h > +++ b/include/hw/core/cpu.h > @@ -542,6 +542,27 @@ struct CPUState { > CPUPluginState *plugin_state; > #endif > > + /* > + * To implement the vCPU hotplug feature (which simulates CPU hotplug > + * behavior), we need to dynamically create and destroy QOM vCPU objects, > + * and (de)associate them with pre-existing KVM vCPUs while (un)parking > the > + * KVM vCPU context. One challenge is ensuring that these dynamically > + * appearing or disappearing QOM vCPU objects are accurately reflected > + * through ACPI to the Guest Kernel. Due to architectural constraints, > + * changing the number of vCPUs after the guest kernel has booted may not > + * always be possible. > + * > + * In certain architectures, to provide the guest kernel with a > *persistent* > + * view of vCPU presence, even when the QOM does not have a corresponding > + * vCPU object, ACPI may simulate the presence of vCPUs by marking them > as > + * ACPI-disabled. This is achieved by setting `_STA.PRES=True` and > + * `_STA.Ena=False` for unplugged vCPUs in QEMU's QOM. > + * > + * By default, this flag is set to `FALSE`, and it must be explicitly set > + * to `TRUE` for architectures like ARM. > + */ > + bool acpi_persistent; > + > /* TODO Move common fields from CPUArchState here. */ > int cpu_index; > int cluster_index;