On 05/17/2014 06:46 AM, Alexander Graf wrote: > > On 16.05.14 17:57, Alexey Kardashevskiy wrote: >> On 05/17/2014 12:16 AM, Alexander Graf wrote: >>> On 15.05.14 13:28, Alexey Kardashevskiy wrote: >>>> Modern Linux kernels support last POWERPC CPUs so when a kernel boots, >>>> in most cases it can find a matching cpu_spec in the kernel's cpu_specs >>>> list. However if the kernel is quite old, it may be missing a definition >>>> of the actual CPU. To provide an ability for old kernels to work on modern >>>> hardware, a Processor Compatibility Mode has been introduced >>>> by the PowerISA specification. >>>> >>>> From the hardware prospective, it is supported by the Processor >>>> Compatibility Register (PCR) which is defined in PowerISA. The register >>>> enables one of the compatibility modes (2.05/2.06/2.07). >>>> Since PCR is a hypervisor privileged register and cannot be >>>> accessed from the guest, the mode selection is done via >>>> ibm,client-architecture-support (CAS) RTAS call using which the guest >>>> specifies what "raw" and "architected" CPU versions it supports. >>>> QEMU works out the best match, changes a "cpu-version" property of >>>> every CPU and notifies the guest about the change by setting these >>>> properties in the buffer passed as a response on a custom H_CAS hypercall. >>>> >>>> Signed-off-by: Alexey Kardashevskiy <a...@ozlabs.ru> >>>> --- >>>> hw/ppc/spapr.c | 40 +++++++++++++++++++++++++ >>>> hw/ppc/spapr_hcall.c | 83 >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>> trace-events | 5 ++++ >>>> 3 files changed, 128 insertions(+) >>>> >>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c >>>> index a2c9106..a0882a1 100644 >>>> --- a/hw/ppc/spapr.c >>>> +++ b/hw/ppc/spapr.c >>>> @@ -35,6 +35,7 @@ >>>> #include "kvm_ppc.h" >>>> #include "mmu-hash64.h" >>>> #include "cpu-models.h" >>>> +#include "qom/cpu.h" >>>> #include "hw/boards.h" >>>> #include "hw/ppc/ppc.h" >>>> @@ -592,11 +593,50 @@ int spapr_h_cas_compose_response(target_ulong addr, >>>> target_ulong size) >>>> { >>>> void *fdt; >>>> sPAPRDeviceTreeUpdateHeader hdr = { .version_id = 1 }; >>>> + CPUState *cs; >>>> + int smt = kvmppc_smt_threads(); >>>> size -= sizeof(hdr); >>>> fdt = g_malloc0(size); >>>> _FDT((fdt_create(fdt, size))); >>>> + _FDT((fdt_begin_node(fdt, "cpus"))); >>>> + >>>> + CPU_FOREACH(cs) { >>>> + PowerPCCPU *cpu = POWERPC_CPU(cs); >>>> + DeviceClass *dc = DEVICE_GET_CLASS(cpu); >>>> + int smpt = spapr_get_compat_smp_threads(cpu); >>>> + int i, index = ppc_get_vcpu_dt_id(cpu); >>>> + uint32_t servers_prop[smpt]; >>>> + uint32_t gservers_prop[smpt * 2]; >>>> + char tmp[32]; >>>> + >>>> + if ((index % smt) != 0) { >>>> + continue; >>>> + } >>>> + >>>> + snprintf(tmp, 32, "%s@%x", dc->fw_name, index); >>>> + trace_spapr_cas_add_subnode(tmp); >>>> + >>>> + _FDT((fdt_begin_node(fdt, tmp))); >>>> + _FDT((fdt_property_cell(fdt, "cpu-version", cpu->cpu_version))); >>>> + >>>> + /* Build interrupt servers and gservers properties */ >>>> + for (i = 0; i < smpt; i++) { >>>> + servers_prop[i] = cpu_to_be32(index + i); >>>> + /* Hack, direct the group queues back to cpu 0 */ >>>> + gservers_prop[i*2] = cpu_to_be32(index + i); >>>> + gservers_prop[i*2 + 1] = 0; >>>> + } >>>> + _FDT((fdt_property(fdt, "ibm,ppc-interrupt-server#s", >>>> + servers_prop, sizeof(servers_prop)))); >>>> + _FDT((fdt_property(fdt, "ibm,ppc-interrupt-gserver#s", >>>> + gservers_prop, sizeof(gservers_prop)))); >>>> + >>>> + _FDT((fdt_end_node(fdt))); >>>> + } >>>> + >>>> + _FDT((fdt_end_node(fdt))); >>> Why is this so much code? Can we only replace full nodes? >> It is a diff tree, in only has properties to change. > > So why do we change so much then? :)
3 (three) properties are "much" now? :) >>> Then please >>> extract the original CPU node creation into a function and just call it >>> from the original generation and from here. >>> >>> If we can also replace single properties I'd say we only need to override >>> cpu-version. >> >> Mmm. The user could run qemu with threads=8 on POWER8 with SLES11 which >> must not see 8 threads but it can update itself and reboot with the kernel >> which is aware of POWER8. You are saying we do not want to support that? > > First off, I think that practically speaking people will want to use > 2-thread granularity on POWER8 because that gives them the best load > exploitation on the system. Practically speaking, people will want to use kernels which can work on POWER8 in raw mode, no? Here we are trying to fix older guests. > So if we just disable CPU nodes that would not be visible usually (there > was a disabled flag in cpu nodes, right?) for threads that are incompatible > with a new CAS mode, we can then remove the disabled flag when the guest > accepts more threads again. > > That means we could expose 8 threads (POWER8), guest only supports 2 > threads (POWER6), so we waste 6 threads. But the remaining 2 will be fast > ;). We basically degrade the system to SMT2 mode. We do not want to show more threads (i.e. bigger ibm,ppc-interrupt-server#s properties) if we do not support more threads as we do not really know what guest kernel (including AIX) or some ppc64_cpu/drmgr/whatever tool will do if they find more. > What does pHyp do here? I wish it was easy to verify :) I fail to see what exactly your objections are all about, I definitely miss something big here. Is it a minor code duplication or some dangerous bug? If it is just a code duplication, I could do something about it. -- Alexey