On 14/09/16 09:29, Michael Roth wrote: > Quoting Alexey Kardashevskiy (2016-07-27 03:03:38) >> This adds a numa id property to a PHB to allow linking passed PCI device >> to CPU/memory. It is up to the management stack to do CPU/memory pinning >> to the node with the actual PCI device. > > It looks like x86 relies on PCIBus->numa_node() method (via > pci_bus_numa_node()) to encode similar PCIBus affinities > into ACPI tables, and currently exposes it via > -device pci-[-express]-expander-bus,numa_node=X.
Well, until we allow DMA windows per PCI bus (not per PHB as it is now), this won't make much sense for us (unless I am missing something here). > Maybe we should implement it using this existing > PCIBus->numa_node() interface? > > We'd still have to expose numa_node as a spapr-pci-host-bridge > device option though. Not sure if there's a more common way > to expose it that might be easier for libvirt to discover. As it > stands we'd need to add spapr-pci-host-bridge to a libvirt > whitelist that currently only covers pci-expander-bus. > > Cc'ing Shiva who was looking into the libvirt side. > > One comment below: > >> >> Signed-off-by: Alexey Kardashevskiy <a...@ozlabs.ru> >> --- >> hw/ppc/spapr_pci.c | 13 +++++++++++++ >> include/hw/pci-host/spapr.h | 2 ++ >> 2 files changed, 15 insertions(+) >> >> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c >> index 949c44f..af5394a 100644 >> --- a/hw/ppc/spapr_pci.c >> +++ b/hw/ppc/spapr_pci.c >> @@ -47,6 +47,7 @@ >> #include "sysemu/device_tree.h" >> #include "sysemu/kvm.h" >> #include "sysemu/hostmem.h" >> +#include "sysemu/numa.h" >> >> #include "hw/vfio/vfio.h" >> >> @@ -1544,6 +1545,7 @@ static Property spapr_phb_properties[] = { >> DEFINE_PROP_BOOL("ddw", sPAPRPHBState, ddw_enabled, true), >> DEFINE_PROP_UINT64("pgsz", sPAPRPHBState, page_size_mask, >> (1ULL << 12) | (1ULL << 16)), >> + DEFINE_PROP_UINT32("node", sPAPRPHBState, numa_node, -1), >> DEFINE_PROP_END_OF_LIST(), >> }; >> >> @@ -1805,6 +1807,11 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb, >> cpu_to_be32(1), >> cpu_to_be32(RTAS_IBM_RESET_PE_DMA_WINDOW) >> }; >> + uint32_t associativity[] = {cpu_to_be32(0x4), >> + cpu_to_be32(0x0), >> + cpu_to_be32(0x0), >> + cpu_to_be32(0x0), >> + cpu_to_be32(phb->numa_node)}; >> sPAPRTCETable *tcet; >> PCIBus *bus = PCI_HOST_BRIDGE(phb)->bus; >> sPAPRFDT s_fdt; >> @@ -1837,6 +1844,12 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb, >> &ddw_extensions, sizeof(ddw_extensions))); >> } >> >> + /* Advertise NUMA via ibm,associativity */ >> + if (nb_numa_nodes > 1) { >> + _FDT(fdt_setprop(fdt, bus_off, "ibm,associativity", associativity, >> + sizeof(associativity))); >> + } > > LoPAPR 15.3 seems to suggest that ibm,associativity-reference-points is > required alongside ibm,associativity for each DT node it appears in, > and since we hardcode "Form 1" affinity it should be done similarly as > the entry we place in the top-level DT node. Hm, okay, I'll add it. There is a question to Bharata - why do we have 4s in spapr_create_fdt_skel()'s refpoints? Just a random pick? vec5[5]==0x80 means we are doing "Form1" and these 4s are far/near distances? >> + >> /* Build the interrupt-map, this must matches what is done >> * in pci_spapr_map_irq >> */ >> diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h >> index 5adc603..53c4b2d 100644 >> --- a/include/hw/pci-host/spapr.h >> +++ b/include/hw/pci-host/spapr.h >> @@ -75,6 +75,8 @@ struct sPAPRPHBState { >> bool ddw_enabled; >> uint64_t page_size_mask; >> uint64_t dma64_win_addr; >> + >> + uint32_t numa_node; >> }; >> >> #define SPAPR_PCI_MAX_INDEX 255 >> -- >> 2.5.0.rc3 >> > -- Alexey