On Mon, 18 May 2020 16:44:18 -0500 Reza Arbab <ar...@linux.ibm.com> wrote:
> NUMA nodes corresponding to GPU memory currently have the same > affinity/distance as normal memory nodes. Add a third NUMA associativity > reference point enabling us to give GPU nodes more distance. > > This is guest visible information, which shouldn't change under a > running guest across migration between different qemu versions, so make > the change effective only in new (pseries > 5.0) machine types. > > Before, `numactl -H` output in a guest with 4 GPUs (nodes 2-5): > > node distances: > node 0 1 2 3 4 5 > 0: 10 40 40 40 40 40 > 1: 40 10 40 40 40 40 > 2: 40 40 10 40 40 40 > 3: 40 40 40 10 40 40 > 4: 40 40 40 40 10 40 > 5: 40 40 40 40 40 10 > > After: > > node distances: > node 0 1 2 3 4 5 > 0: 10 40 80 80 80 80 > 1: 40 10 80 80 80 80 > 2: 80 80 10 80 80 80 > 3: 80 80 80 10 80 80 > 4: 80 80 80 80 10 80 > 5: 80 80 80 80 80 10 > > These are the same distances as on the host, mirroring the change made > to host firmware in skiboot commit f845a648b8cb ("numa/associativity: > Add a new level of NUMA for GPU's"). > > Signed-off-by: Reza Arbab <ar...@linux.ibm.com> > --- > hw/ppc/spapr.c | 11 +++++++++-- > hw/ppc/spapr_pci_nvlink2.c | 2 +- > 2 files changed, 10 insertions(+), 3 deletions(-) > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > index 88b4a1f17716..1d9193d5ee49 100644 > --- a/hw/ppc/spapr.c > +++ b/hw/ppc/spapr.c > @@ -893,7 +893,11 @@ static void spapr_dt_rtas(SpaprMachineState *spapr, void > *fdt) > int rtas; > GString *hypertas = g_string_sized_new(256); > GString *qemu_hypertas = g_string_sized_new(256); > - uint32_t refpoints[] = { cpu_to_be32(0x4), cpu_to_be32(0x4) }; > + uint32_t refpoints[] = { > + cpu_to_be32(0x4), > + cpu_to_be32(0x4), > + cpu_to_be32(0x2), > + }; > uint32_t nr_refpoints; > uint64_t max_device_addr = MACHINE(spapr)->device_memory->base + > memory_region_size(&MACHINE(spapr)->device_memory->mr); > @@ -4544,7 +4548,7 @@ static void spapr_machine_class_init(ObjectClass *oc, > void *data) > smc->linux_pci_probe = true; > smc->smp_threads_vsmt = true; > smc->nr_xirqs = SPAPR_NR_XIRQS; > - smc->nr_assoc_refpoints = 2; > + smc->nr_assoc_refpoints = 3; > xfc->match_nvt = spapr_match_nvt; > } > > @@ -4611,8 +4615,11 @@ DEFINE_SPAPR_MACHINE(5_1, "5.1", true); > */ > static void spapr_machine_5_0_class_options(MachineClass *mc) > { > + SpaprMachineClass *smc = SPAPR_MACHINE_CLASS(mc); > + > spapr_machine_5_1_class_options(mc); > compat_props_add(mc->compat_props, hw_compat_5_0, hw_compat_5_0_len); > + smc->nr_assoc_refpoints = 2; > } > > DEFINE_SPAPR_MACHINE(5_0, "5.0", false); > diff --git a/hw/ppc/spapr_pci_nvlink2.c b/hw/ppc/spapr_pci_nvlink2.c > index 8332d5694e46..247fd48731e2 100644 > --- a/hw/ppc/spapr_pci_nvlink2.c > +++ b/hw/ppc/spapr_pci_nvlink2.c > @@ -362,7 +362,7 @@ void spapr_phb_nvgpu_ram_populate_dt(SpaprPhbState *sphb, > void *fdt) > uint32_t associativity[] = { > cpu_to_be32(0x4), > SPAPR_GPU_NUMA_ID, > - SPAPR_GPU_NUMA_ID, > + cpu_to_be32(nvslot->numa_id), This is a guest visible change. It should theoretically be controlled with a compat property of the PHB (look for "static GlobalProperty" in spapr.c). But since this code is only used for GPU passthrough and we don't support migration of such devices, I guess it's okay. Maybe just mention it in the changelog. > SPAPR_GPU_NUMA_ID, > cpu_to_be32(nvslot->numa_id) > };