In implementations of ARM architecture, at most there could be a
cpu hierarchy like "sockets/dies/clusters/cores/threads" defined.
For example, ARM64 server chip Kunpeng 920 totally has 2 sockets,
2 NUMA nodes (also means cpu dies) in each socket, 6 clusters in
each NUMA node, 4 cores in each cluster, and doesn't support SMT.
Clusters within the same NUMA share a L3 cache and cores within
the same cluster share a L2 cache.
The cache affinity of ARM cluster has been proved to improve the
kernel scheduling performance and a patchset has been posted, in
which a general sched_domain for clusters was added and a cluster
level was added in the arch-neutral cpu topology struct like below.
struct cpu_topology {
int thread_id;
int core_id;
int cluster_id;
int package_id;
int llc_id;
cpumask_t thread_sibling;
cpumask_t core_sibling;
cpumask_t cluster_sibling;
cpumask_t llc_sibling;
}
In virtuallization, exposing the cluster level topology to guest
kernel may also improve the scheduling performance. So let's add
the -smp, clusters=* command line support for ARM cpu, then users
will be able to define a four-level cpu hierarchy for machines
and it will be sockets/clusters/cores/threads.
Because we only support clusters for ARM cpu currently, a new member
"smp_clusters" is only added to the VirtMachineState structure.
Signed-off-by: Yanan Wang <wangyana...@huawei.com>
---
include/hw/arm/virt.h | 1 +
qemu-options.hx | 26 +++++++++++++++-----------
softmmu/vl.c | 3 +++
3 files changed, 19 insertions(+), 11 deletions(-)
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index f546dd2023..74fff9667b 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -156,6 +156,7 @@ struct VirtMachineState {
char *pciehb_nodename;
const int *irqmap;
int fdt_size;
+ unsigned smp_clusters;
uint32_t clock_phandle;
uint32_t gic_phandle;
uint32_t msi_phandle;
diff --git a/qemu-options.hx b/qemu-options.hx
index bd97086c21..245eb415a6 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -184,25 +184,29 @@ SRST
ERST
DEF("smp", HAS_ARG, QEMU_OPTION_smp,
- "-smp
[cpus=]n[,maxcpus=cpus][,cores=cores][,threads=threads][,dies=dies][,sockets=sockets]\n"
+ "-smp
[cpus=]n[,maxcpus=cpus][,cores=cores][,threads=threads][,clusters=clusters][,dies=dies][,sockets=sockets]\n"
" set the number of CPUs to 'n' [default=1]\n"
" maxcpus= maximum number of total cpus, including\n"
" offline CPUs for hotplug, etc\n"
- " cores= number of CPU cores on one socket (for PC, it's on one
die)\n"
+ " cores= number of CPU cores on one socket\n"
+ " (it's on one die for PC, and on one cluster for ARM)\n"
" threads= number of threads on one CPU core\n"
+ " clusters= number of CPU clusters on one socket (for ARM
only)\n"
" dies= number of CPU dies on one socket (for PC only)\n"
" sockets= number of discrete sockets in the system\n",
QEMU_ARCH_ALL)
SRST
-``-smp
[cpus=]n[,cores=cores][,threads=threads][,dies=dies][,sockets=sockets][,maxcpus=maxcpus]``
- Simulate an SMP system with n CPUs. On the PC target, up to 255 CPUs
- are supported. On Sparc32 target, Linux limits the number of usable
- CPUs to 4. For the PC target, the number of cores per die, the
- number of threads per cores, the number of dies per packages and the
- total number of sockets can be specified. Missing values will be
- computed. If any on the three values is given, the total number of
- CPUs n can be omitted. maxcpus specifies the maximum number of
- hotpluggable CPUs.
+``-smp
[cpus=]n[,cores=cores][,threads=threads][,clusters=clusters][,dies=dies][,sockets=sockets][,maxcpus=maxcpus]``
+ Simulate an SMP system with n CPUs. On the PC target, up to 255
+ CPUs are supported. On the Sparc32 target, Linux limits the number
+ of usable CPUs to 4. For the PC target, the number of threads per
+ core, the number of cores per die, the number of dies per package
+ and the total number of sockets can be specified. For the ARM target,
+ the number of threads per core, the number of cores per cluster, the
+ number of clusters per socket and the total number of sockets can be
+ specified. And missing values will be computed. If any of the five