Hi Thiago, On 7/8/20 1:28 AM, Thiago Jung Bauermann wrote: > > Hello Eduardo, > > Eduardo Habkost <ehabk...@redhat.com> writes: > >> On Tue, Jul 07, 2020 at 05:43:33PM -0300, Thiago Jung Bauermann wrote: >>> PowerPC sPAPRs CPUs start in the halted state, but generic QEMU code >>> assumes that CPUs start in the non-halted state. spapr_reset_vcpu() >>> attempts to rectify this by setting CPUState::halted to 1. But that's too >>> late for hotplugged CPUs in a machine configured with 2 or mor threads per >>> core. >>> >>> By then, other parts of QEMU have already caused the vCPU to run in an >>> unitialized state a couple of times. For example, ppc_cpu_reset() calls >>> ppc_tlb_invalidate_all(), which ends up calling async_run_on_cpu(). This >>> kicks the new vCPU while it has CPUState::halted = 0, causing QEMU to issue >>> a KVM_RUN ioctl on the new vCPU before the guest is able to make the >>> start-cpu RTAS call to initialize its register state. >>> >>> This doesn't seem to cause visible issues for regular guests, but on a >>> secure guest running under the Ultravisor it does. The Ultravisor relies on >>> being able to snoop on the start-cpu RTAS call to map vCPUs to guests, and >>> this issue causes it to see a stray vCPU that doesn't belong to any guest. >>> >>> Fix by adding a starts_halted() method to the CPUState class, and making it >>> return 1 if the machine is an sPAPR guest. >>> >>> Signed-off-by: Thiago Jung Bauermann <bauer...@linux.ibm.com> >> [...] >>> +static uint32_t ppc_cpu_starts_halted(void) >>> +{ >>> + SpaprMachineState *spapr = >>> + (SpaprMachineState *) object_dynamic_cast(qdev_get_machine(), >>> + TYPE_SPAPR_MACHINE); >> >> Wouldn't it be simpler to just implement this as a MachineClass >> boolean field? e.g.:
Class boolean field certainly sounds better, but I am not sure this is a property of the machine. Rather the arch? So move the field to CPUClass? Maybe not, let's discuss :) >> >> Signed-off-by: Eduardo Habkost <ehabk...@redhat.com> > > Yes, indeed it would. Thanks for this patch. I just tested and it > also solves the problem (except for the nit mentioned below). > > Tested-by: Thiago Jung Bauermann <bauer...@linux.ibm.com> > > Should I submit a proper patch with these changes (with you as the > author)? > >> --- >> diff --git a/include/hw/boards.h b/include/hw/boards.h >> index 426ce5f625..ffadc7a17d 100644 >> --- a/include/hw/boards.h >> +++ b/include/hw/boards.h >> @@ -215,6 +215,7 @@ struct MachineClass { >> bool nvdimm_supported; >> bool numa_mem_supported; >> bool auto_enable_numa; >> + bool cpu_starts_halted; >> const char *default_ram_id; >> >> HotplugHandler *(*get_hotplug_handler)(MachineState *machine, >> diff --git a/hw/core/cpu.c b/hw/core/cpu.c >> index 0f23409f1d..08dd504034 100644 >> --- a/hw/core/cpu.c >> +++ b/hw/core/cpu.c >> @@ -252,6 +252,7 @@ static void cpu_common_reset(DeviceState *dev) >> { >> CPUState *cpu = CPU(dev); >> CPUClass *cc = CPU_GET_CLASS(cpu); >> + MachineState *machine = object_dynamic_cast(qdev_get_machine(), >> TYPE_MACHINE); > > I had to add a (MachineState *) cast here to get the code to compile. Btw why not use MACHINE(qdev_get_machine()) ? > >> >> if (qemu_loglevel_mask(CPU_LOG_RESET)) { >> qemu_log("CPU Reset (CPU %d)\n", cpu->cpu_index); >> @@ -259,7 +260,7 @@ static void cpu_common_reset(DeviceState *dev) >> } >> >> cpu->interrupt_request = 0; >> - cpu->halted = 0; >> + cpu->halted = machine ? MACHINE_GET_CLASS(machine)->cpu_starts_halted : >> 0; >> cpu->mem_io_pc = 0; >> cpu->icount_extra = 0; >> atomic_set(&cpu->icount_decr_ptr->u32, 0); >> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c >> index f6f034d039..d16ec33033 100644 >> --- a/hw/ppc/spapr.c >> +++ b/hw/ppc/spapr.c >> @@ -4487,6 +4487,7 @@ static void spapr_machine_class_init(ObjectClass *oc, >> void *data) >> mc->default_cpu_type = POWERPC_CPU_TYPE_NAME("power9_v2.0"); >> mc->has_hotpluggable_cpus = true; >> mc->nvdimm_supported = true; >> + mc->cpu_starts_halted = true; >> smc->resize_hpt_default = SPAPR_RESIZE_HPT_ENABLED; >> fwc->get_dev_path = spapr_get_fw_dev_path; >> nc->nmi_monitor_handler = spapr_nmi; >> >>> + >>> + /* >>> + * In sPAPR, all CPUs start halted. CPU0 is unhalted from the machine >>> level >>> + * reset code and the rest are explicitly started up by the guest >>> using an >>> + * RTAS call. >>> + */ >>> + return spapr != NULL; >>> +} >>> + > > > -- > Thiago Jung Bauermann > IBM Linux Technology Center >