On 27.09.2015 [23:59:08 +0530], Raghavendra K T wrote: > Problem description: > Powerpc has sparse node numbering, i.e. on a 4 node system nodes are > numbered (possibly) as 0,1,16,17. At a lower level, we map the chipid > got from device tree is naturally mapped (directly) to nid.
chipid is a OPAL concept, I believe, and not documented in PAPR... How does this work under PowerVM? > Potential side effect of that is: > > 1) There are several places in kernel that assumes serial node numbering. > and memory allocations assume that all the nodes from 0-(highest nid) > exist inturn ending up allocating memory for the nodes that does not exist. > > 2) For virtualization use cases (such as qemu, libvirt, openstack), mapping > sparse nid of the host system to contiguous nids of guest (numa affinity, > placement) could be a challenge. > > Possible Solutions: > 1) Handling the memory allocations is kernel case by case: Though in some > cases it is easy to achieve, some cases may be intrusive/not trivial. > at the end it does not handle side effect (2) above. > > 2) Map the sparse chipid got from device tree to a serial nid at kernel > level (The idea proposed in this series). > Pro: It is more natural to handle at kernel level than at lower (OPAL) layer. > con: The chipid is in device tree no longer the same as nid in kernel Is there any debugging/logging? Looks like not -- so how does a sysadmin map from firmware-provided values to the Linux values? That's going to make debugging of large systems (PowerVM or otherwise) less than pleasant, it seems? Possibly you could put something in sysfs? -Nish -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/