On Fri, Apr 24, 2015 at 12:17:27PM +0530, Bharata B Rao wrote: > Reorganize CPU device tree generation code so that it be reused from > hotplug path. CPU dt entries are now generated from spapr_finalize_fdt() > instead of spapr_create_fdt_skel().
Creating CPU DT entries from spapr_finalize_fdt() instead of spapr_create_fdt_skel() has an interesting side effect. Before this patch, when I boot an SMP guest with the following configuration: -smp 4 -numa node,cpus=0-1,mem=4G,nodeid=0 -numa node,cpus=2-3,mem=4G,nodeid=1 the guest CPUs come up in the following fashion: [root@localhost ~]# cat /proc/cpuinfo processor : 0 cpu dt id : 0 processor : 1 cpu dt id : 8 processor : 2 cpu dt id : 16 processor : 3 cpu dt id : 24 In the above /proc/cpuinfo output, only the relevant fields are retained and the newly added field "cpu dt id" is essentially obtained by arch/powerpc/include/asm/smp.h:get_hard_smp_processor_id() in the kernel. [root@localhost ~]# lscpu CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 1 Core(s) per socket: 1 Socket(s): 4 NUMA node(s): 2 NUMA node0 CPU(s): 0,1 NUMA node1 CPU(s): 2,3 Here CPUs 0,1 are in node0 and 2,3 are in node1 as specified. The same is reported by QEMU monitor below. (qemu) info numa 2 nodes node 0 cpus: 0 1 node 0 size: 4096 MB node 1 cpus: 2 3 node 1 size: 4096 MB After this patch where CPU DT entries are built in spapr_finalize_fdt() completely, CPU enumeration done by the guest kernel gets reversed since the CPU DT nodes end up getting discovered by guest kernel in the reverse order in arch/powerpc/kernel/setup-common.c:smp_setup_cpu_maps(). With this the resulting guest SMP configuration looks like this: [root@localhost ~]# cat /proc/cpuinfo processor : 0 cpu dt id : 24 <--- was 0 previously processor : 1 cpu dt id : 16 <--- was 8 previously processor : 2 cpu dt id : 8 <--- was 16 previously processor : 3 cpu dt id : 0 <--- was 24 previously [root@localhost ~]# lscpu CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 1 Core(s) per socket: 1 Socket(s): 4 NUMA node(s): 2 NUMA node0 CPU(s): 2,3 <--- node0 was supposed to have 0,1 NUMA node1 CPU(s): 0,1 <--- node1 was supposed to have 2,3 (qemu) info numa 2 nodes node 0 cpus: 0 1 node 0 size: 4096 MB node 1 cpus: 2 3 node 1 size: 4096 MB This is not wrong per se because CPUs with correct DT ids ended up on right NUMA nodes, but just that the CPU numbers assigned by guest got reversed. Is this acceptable or will this break some userpace ? In both the cases, I am adding CPU DT nodes from QEMU in the same order, but not sure why the guest kernel discovers them in different orders in each case. > +static void spapr_populate_cpus_dt_node(void *fdt, sPAPREnvironment *spapr) > +{ > + CPUState *cs; > + int cpus_offset; > + char *nodename; > + int smt = kvmppc_smt_threads(); > + > + cpus_offset = fdt_add_subnode(fdt, 0, "cpus"); > + _FDT(cpus_offset); > + _FDT((fdt_setprop_cell(fdt, cpus_offset, "#address-cells", 0x1))); > + _FDT((fdt_setprop_cell(fdt, cpus_offset, "#size-cells", 0x0))); > + > + CPU_FOREACH(cs) { > + PowerPCCPU *cpu = POWERPC_CPU(cs); > + int index = ppc_get_vcpu_dt_id(cpu); > + DeviceClass *dc = DEVICE_GET_CLASS(cs); > + int offset; > + > + if ((index % smt) != 0) { > + continue; > + } > + > + nodename = g_strdup_printf("%s@%x", dc->fw_name, index); > + offset = fdt_add_subnode(fdt, cpus_offset, nodename); > + g_free(nodename); > + _FDT(offset); > + spapr_populate_cpu_dt(cs, fdt, offset); > + } I can simply fix this by walking the CPUs in reverse order in the above code which makes the guest kernel to discover the CPU DT nodes in the right order. s/CPU_FOREACH(cs)/CPU_FOREACH_REVERSE(cs) will solve this problem. Would this be the right approach or should we just leave it to the guest kernel to discover and enumerate CPUs in whatever order it finds the DT nodes in FDT ? Regards, Bharata.