Hi,
Using a QEMU pseries guest with this follwing SMP topology, with a single NUMA node: (...) -smp 32,threads=4,cores=4,sockets=2, (...) This is the output of lscpu with a guest running v5.12-rc5: [root@localhost ~]# lscpu Architecture: ppc64le Byte Order: Little Endian CPU(s): 32 On-line CPU(s) list: 0-31 Thread(s) per core: 4 Core(s) per socket: 8 Socket(s): 1 NUMA node(s): 1 Model: 2.2 (pvr 004e 1202) Model name: POWER9 (architected), altivec supported Hypervisor vendor: KVM Virtualization type: para L1d cache: 32K L1i cache: 32K NUMA node0 CPU(s): 0-31 [root@localhost ~]# The changes with cpu_core_mask made the topology sockets matching NUMA nodes. In this case, given that we have a single NUMA node, the SMP topology got adjusted to have 8 cores instead of 4 so we can have a single socket as well. Although sockets equal to NUMA nodes is true for Power hardware, QEMU doesn't have this constraint and users expect sockets and NUMA nodes to be kind of independent, regardless of how unpractical that would be with real hardware. The same guest running a kernel with this series applied: [root@localhost ~]# lscpu Architecture: ppc64le Byte Order: Little Endian CPU(s): 32 On-line CPU(s) list: 0-31 Thread(s) per core: 4 Core(s) per socket: 4 Socket(s): 2 NUMA node(s): 1 Model: 2.2 (pvr 004e 1202) Model name: POWER9 (architected), altivec supported Hypervisor vendor: KVM Virtualization type: para L1d cache: 32K L1i cache: 32K NUMA node0 CPU(s): 0-31 The sockets and NUMA nodes are being represented separately, as intended via the QEMU command line. Thanks for the looking this up, Srikar. For all patches: Tested-by: Daniel Henrique Barboza <danielhb...@gmail.com> On 4/15/21 9:09 AM, Srikar Dronamraju wrote:
Daniel had reported that QEMU is now unable to see requested topologies in a multi socket single NUMA node configurations. -smp 8,maxcpus=8,cores=2,threads=2,sockets=2 This patchset reintroduces cpu_core_mask so that users can see requested topologies while still maintaining the boot time of very large system configurations. It includes caching the chip_id as suggested by Michael Ellermann 4 Threads/Core; 4 cores/Socket; 4 Sockets/Node, 2 Nodes in System -numa node,nodeid=0,memdev=m0 \ -numa node,nodeid=1,memdev=m1 \ -smp 128,sockets=8,threads=4,maxcpus=128 \ 5.12.0-rc5 (or any kernel with commit 4ca234a9cbd7) --------------------------------------------------- srikar@cloudy:~$ lscpu Architecture: ppc64le Byte Order: Little Endian CPU(s): 128 On-line CPU(s) list: 0-127 Thread(s) per core: 4 Core(s) per socket: 16 Socket(s): 2 <<<<<----- NUMA node(s): 2 Model: 2.3 (pvr 004e 1203) Model name: POWER9 (architected), altivec supported Hypervisor vendor: KVM Virtualization type: para L1d cache: 1 MiB L1i cache: 1 MiB NUMA node0 CPU(s): 0-15,32-47,64-79,96-111 NUMA node1 CPU(s): 16-31,48-63,80-95,112-127 -- srikar@cloudy:~$ dmesg |grep smp [ 0.010658] smp: Bringing up secondary CPUs ... [ 0.424681] smp: Brought up 2 nodes, 128 CPUs -- 5.12.0-rc5 + 3 patches ---------------------- srikar@cloudy:~$ lscpu Architecture: ppc64le Byte Order: Little Endian CPU(s): 128 On-line CPU(s) list: 0-127 Thread(s) per core: 4 Core(s) per socket: 4 Socket(s): 8 <<<<----- NUMA node(s): 2 Model: 2.3 (pvr 004e 1203) Model name: POWER9 (architected), altivec supported Hypervisor vendor: KVM Virtualization type: para L1d cache: 1 MiB L1i cache: 1 MiB NUMA node0 CPU(s): 0-15,32-47,64-79,96-111 NUMA node1 CPU(s): 16-31,48-63,80-95,112-127 -- srikar@cloudy:~$ dmesg |grep smp [ 0.010372] smp: Bringing up secondary CPUs ... [ 0.417892] smp: Brought up 2 nodes, 128 CPUs 5.12.0-rc5 ---------- srikar@cloudy:~$ lscpu Architecture: ppc64le Byte Order: Little Endian CPU(s): 1024 On-line CPU(s) list: 0-1023 Thread(s) per core: 8 Core(s) per socket: 128 Socket(s): 1 NUMA node(s): 1 Model: 2.3 (pvr 004e 1203) Model name: POWER9 (architected), altivec supported Hypervisor vendor: KVM Virtualization type: para L1d cache: 4 MiB L1i cache: 4 MiB NUMA node0 CPU(s): 0-1023 srikar@cloudy:~$ dmesg | grep smp [ 0.027753 ] smp: Bringing up secondary CPUs ... [ 2.315193 ] smp: Brought up 1 node, 1024 CPUs 5.12.0-rc5 + 3 patches ---------------------- srikar@cloudy:~$ dmesg | grep smp [ 0.027659 ] smp: Bringing up secondary CPUs ... [ 2.532739 ] smp: Brought up 1 node, 1024 CPUs I also have booted and tested the kernels on PowerVM and PowerNV and even there I see a very negligible increase in the bringing up time of secondary CPUs Srikar Dronamraju (3): powerpc/smp: Reintroduce cpu_core_mask Revert "powerpc/topology: Update topology_core_cpumask" powerpc/smp: Cache CPU to chip lookup arch/powerpc/include/asm/smp.h | 6 ++++ arch/powerpc/include/asm/topology.h | 2 +- arch/powerpc/kernel/prom.c | 19 +++++++--- arch/powerpc/kernel/smp.c | 56 +++++++++++++++++++++++++---- 4 files changed, 71 insertions(+), 12 deletions(-)