On 05/17/2014 11:45 AM, Alexey Kardashevskiy wrote: > On 05/17/2014 06:46 AM, Alexander Graf wrote: >> >> On 16.05.14 17:57, Alexey Kardashevskiy wrote: >>> On 05/17/2014 12:16 AM, Alexander Graf wrote: >>>> On 15.05.14 13:28, Alexey Kardashevskiy wrote: >>>>> Modern Linux kernels support last POWERPC CPUs so when a kernel boots, >>>>> in most cases it can find a matching cpu_spec in the kernel's cpu_specs >>>>> list. However if the kernel is quite old, it may be missing a definition >>>>> of the actual CPU. To provide an ability for old kernels to work on modern >>>>> hardware, a Processor Compatibility Mode has been introduced >>>>> by the PowerISA specification. >>>>> >>>>> From the hardware prospective, it is supported by the Processor >>>>> Compatibility Register (PCR) which is defined in PowerISA. The register >>>>> enables one of the compatibility modes (2.05/2.06/2.07). >>>>> Since PCR is a hypervisor privileged register and cannot be >>>>> accessed from the guest, the mode selection is done via >>>>> ibm,client-architecture-support (CAS) RTAS call using which the guest >>>>> specifies what "raw" and "architected" CPU versions it supports. >>>>> QEMU works out the best match, changes a "cpu-version" property of >>>>> every CPU and notifies the guest about the change by setting these >>>>> properties in the buffer passed as a response on a custom H_CAS hypercall. >>>>> >>>>> Signed-off-by: Alexey Kardashevskiy <a...@ozlabs.ru> >>>>> --- >>>>> hw/ppc/spapr.c | 40 +++++++++++++++++++++++++ >>>>> hw/ppc/spapr_hcall.c | 83 >>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>> trace-events | 5 ++++ >>>>> 3 files changed, 128 insertions(+) >>>>> >>>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c >>>>> index a2c9106..a0882a1 100644 >>>>> --- a/hw/ppc/spapr.c >>>>> +++ b/hw/ppc/spapr.c >>>>> @@ -35,6 +35,7 @@ >>>>> #include "kvm_ppc.h" >>>>> #include "mmu-hash64.h" >>>>> #include "cpu-models.h" >>>>> +#include "qom/cpu.h" >>>>> #include "hw/boards.h" >>>>> #include "hw/ppc/ppc.h" >>>>> @@ -592,11 +593,50 @@ int spapr_h_cas_compose_response(target_ulong addr, >>>>> target_ulong size) >>>>> { >>>>> void *fdt; >>>>> sPAPRDeviceTreeUpdateHeader hdr = { .version_id = 1 }; >>>>> + CPUState *cs; >>>>> + int smt = kvmppc_smt_threads(); >>>>> size -= sizeof(hdr); >>>>> fdt = g_malloc0(size); >>>>> _FDT((fdt_create(fdt, size))); >>>>> + _FDT((fdt_begin_node(fdt, "cpus"))); >>>>> + >>>>> + CPU_FOREACH(cs) { >>>>> + PowerPCCPU *cpu = POWERPC_CPU(cs); >>>>> + DeviceClass *dc = DEVICE_GET_CLASS(cpu); >>>>> + int smpt = spapr_get_compat_smp_threads(cpu); >>>>> + int i, index = ppc_get_vcpu_dt_id(cpu); >>>>> + uint32_t servers_prop[smpt]; >>>>> + uint32_t gservers_prop[smpt * 2]; >>>>> + char tmp[32]; >>>>> + >>>>> + if ((index % smt) != 0) { >>>>> + continue; >>>>> + } >>>>> + >>>>> + snprintf(tmp, 32, "%s@%x", dc->fw_name, index); >>>>> + trace_spapr_cas_add_subnode(tmp); >>>>> + >>>>> + _FDT((fdt_begin_node(fdt, tmp))); >>>>> + _FDT((fdt_property_cell(fdt, "cpu-version", cpu->cpu_version))); >>>>> + >>>>> + /* Build interrupt servers and gservers properties */ >>>>> + for (i = 0; i < smpt; i++) { >>>>> + servers_prop[i] = cpu_to_be32(index + i); >>>>> + /* Hack, direct the group queues back to cpu 0 */ >>>>> + gservers_prop[i*2] = cpu_to_be32(index + i); >>>>> + gservers_prop[i*2 + 1] = 0; >>>>> + } >>>>> + _FDT((fdt_property(fdt, "ibm,ppc-interrupt-server#s", >>>>> + servers_prop, sizeof(servers_prop)))); >>>>> + _FDT((fdt_property(fdt, "ibm,ppc-interrupt-gserver#s", >>>>> + gservers_prop, sizeof(gservers_prop)))); >>>>> + >>>>> + _FDT((fdt_end_node(fdt))); >>>>> + } >>>>> + >>>>> + _FDT((fdt_end_node(fdt))); >>>> Why is this so much code? Can we only replace full nodes? >>> It is a diff tree, in only has properties to change. >> >> So why do we change so much then? :) > > > 3 (three) properties are "much" now? :) > >>>> Then please >>>> extract the original CPU node creation into a function and just call it >>>> from the original generation and from here. >>>> >>>> If we can also replace single properties I'd say we only need to override >>>> cpu-version. >>> >>> Mmm. The user could run qemu with threads=8 on POWER8 with SLES11 which >>> must not see 8 threads but it can update itself and reboot with the kernel >>> which is aware of POWER8. You are saying we do not want to support that? >> >> First off, I think that practically speaking people will want to use >> 2-thread granularity on POWER8 because that gives them the best load >> exploitation on the system. > > Practically speaking, people will want to use kernels which can work on > POWER8 in raw mode, no? Here we are trying to fix older guests. > >> So if we just disable CPU nodes that would not be visible usually (there >> was a disabled flag in cpu nodes, right?) for threads that are incompatible >> with a new CAS mode, we can then remove the disabled flag when the guest >> accepts more threads again.
I just realized where misunderstanding happened - when I switch from 4 threads per core to 2 threads per core, the number of CPU nodes does not change, only server#s and gserver#s properties do and this is the only way of telling the guest the new number of threads. So if you still want to disable some nodes, please explain more. Thanks! >> That means we could expose 8 threads (POWER8), guest only supports 2 >> threads (POWER6), so we waste 6 threads. But the remaining 2 will be fast >> ;). We basically degrade the system to SMT2 mode. > > We do not want to show more threads (i.e. bigger ibm,ppc-interrupt-server#s > properties) if we do not support more threads as we do not really know what > guest kernel (including AIX) or some ppc64_cpu/drmgr/whatever tool will do > if they find more. > >> What does pHyp do here? > > I wish it was easy to verify :) > > I fail to see what exactly your objections are all about, I definitely miss > something big here. Is it a minor code duplication or some dangerous bug? > If it is just a code duplication, I could do something about it. > > -- Alexey