NUMA nodes corresponding to GPU memory currently have the same affinity/distance as normal memory nodes. Add a third NUMA associativity reference point enabling us to give GPU nodes more distance.
This is guest visible information, which shouldn't change under a running guest across migration between different qemu versions, so make the change effective only in new (pseries > 5.0) machine types. Before, `numactl -H` output in a guest with 4 GPUs (nodes 2-5): node distances: node 0 1 2 3 4 5 0: 10 40 40 40 40 40 1: 40 10 40 40 40 40 2: 40 40 10 40 40 40 3: 40 40 40 10 40 40 4: 40 40 40 40 10 40 5: 40 40 40 40 40 10 After: node distances: node 0 1 2 3 4 5 0: 10 40 80 80 80 80 1: 40 10 80 80 80 80 2: 80 80 10 80 80 80 3: 80 80 80 10 80 80 4: 80 80 80 80 10 80 5: 80 80 80 80 80 10 These are the same distances as on the host, mirroring the change made to host firmware in skiboot commit f845a648b8cb ("numa/associativity: Add a new level of NUMA for GPU's"). Signed-off-by: Reza Arbab <ar...@linux.ibm.com> --- hw/ppc/spapr.c | 11 +++++++++-- hw/ppc/spapr_pci_nvlink2.c | 2 +- 2 files changed, 10 insertions(+), 3 deletions(-) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 88b4a1f17716..1d9193d5ee49 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -893,7 +893,11 @@ static void spapr_dt_rtas(SpaprMachineState *spapr, void *fdt) int rtas; GString *hypertas = g_string_sized_new(256); GString *qemu_hypertas = g_string_sized_new(256); - uint32_t refpoints[] = { cpu_to_be32(0x4), cpu_to_be32(0x4) }; + uint32_t refpoints[] = { + cpu_to_be32(0x4), + cpu_to_be32(0x4), + cpu_to_be32(0x2), + }; uint32_t nr_refpoints; uint64_t max_device_addr = MACHINE(spapr)->device_memory->base + memory_region_size(&MACHINE(spapr)->device_memory->mr); @@ -4544,7 +4548,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data) smc->linux_pci_probe = true; smc->smp_threads_vsmt = true; smc->nr_xirqs = SPAPR_NR_XIRQS; - smc->nr_assoc_refpoints = 2; + smc->nr_assoc_refpoints = 3; xfc->match_nvt = spapr_match_nvt; } @@ -4611,8 +4615,11 @@ DEFINE_SPAPR_MACHINE(5_1, "5.1", true); */ static void spapr_machine_5_0_class_options(MachineClass *mc) { + SpaprMachineClass *smc = SPAPR_MACHINE_CLASS(mc); + spapr_machine_5_1_class_options(mc); compat_props_add(mc->compat_props, hw_compat_5_0, hw_compat_5_0_len); + smc->nr_assoc_refpoints = 2; } DEFINE_SPAPR_MACHINE(5_0, "5.0", false); diff --git a/hw/ppc/spapr_pci_nvlink2.c b/hw/ppc/spapr_pci_nvlink2.c index 8332d5694e46..247fd48731e2 100644 --- a/hw/ppc/spapr_pci_nvlink2.c +++ b/hw/ppc/spapr_pci_nvlink2.c @@ -362,7 +362,7 @@ void spapr_phb_nvgpu_ram_populate_dt(SpaprPhbState *sphb, void *fdt) uint32_t associativity[] = { cpu_to_be32(0x4), SPAPR_GPU_NUMA_ID, - SPAPR_GPU_NUMA_ID, + cpu_to_be32(nvslot->numa_id), SPAPR_GPU_NUMA_ID, cpu_to_be32(nvslot->numa_id) }; -- 2.18.2