"Aneesh Kumar K.V" <aneesh.ku...@linux.ibm.com> writes: > On 8/8/20 2:15 AM, Nathan Lynch wrote: >> "Aneesh Kumar K.V" <aneesh.ku...@linux.ibm.com> writes: >>> On 8/7/20 9:54 AM, Nathan Lynch wrote: >>>> "Aneesh Kumar K.V" <aneesh.ku...@linux.ibm.com> writes: >>>>> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c >>>>> index e437a9ac4956..6c659aada55b 100644 >>>>> --- a/arch/powerpc/mm/numa.c >>>>> +++ b/arch/powerpc/mm/numa.c >>>>> @@ -221,25 +221,51 @@ static void initialize_distance_lookup_table(int >>>>> nid, >>>>> } >>>>> } >>>>> >>>>> +static u32 nid_map[MAX_NUMNODES] = {[0 ... MAX_NUMNODES - 1] = >>>>> NUMA_NO_NODE}; >>>> >>>> It's odd to me to use MAX_NUMNODES for this array when it's going to be >>>> indexed not by Linux's logical node IDs but by the platform-provided >>>> domain number, which has no relation to MAX_NUMNODES. >>> >>> >>> I didn't want to dynamically allocate this. We could fetch >>> "ibm,max-associativity-domains" to find the size for that. The current >>> code do assume firmware group id to not exceed MAX_NUMNODES. Hence kept >>> the array size to be MAX_NUMNODEs. I do agree that it is confusing. May >>> be we can do #define MAX_AFFINITY_DOMAIN MAX_NUMNODES? >> >> Well, consider: >> >> - ibm,max-associativity-domains can change at runtime with LPM. This >> doesn't happen in practice yet, but we should probably start thinking >> about how to support that. >> - The domain numbering isn't clearly specified to have any particular >> properties such as beginning at zero or a contiguous range. >> >> While the current code likely contains assumptions contrary to these >> points, a change such as this is an opportunity to think about whether >> those assumptions can be reduced or removed. In particular I think it >> would be good to gracefully degrade when the number of NUMA affinity >> domains can exceed MAX_NUMNODES. Using the platform-supplied domain >> numbers to directly index Linux data structures will make that >> impossible. >> >> So, maybe genradix or even xarray wouldn't actually be overengineering >> here. >> > > One of the challenges with such a data structure is that we initialize > the nid_map before the slab is available. This means a memblock based > allocation and we would end up implementing such a sparse data structure > ourselves here. > > As you mentioned above, since we know that hypervisor as of now limits > the max affinity domain id below ibm,max-associativity-domains we are > good with an array-like nid_map we have here. This keeps the code simpler. > > This will also allow us to switch to a more sparse data structure as you > requested here in the future because the main change that is pushed in > this series is the usage of firmare_group_id_to_nid(). The details of > the data structure we use to keep track of that mapping are pretty much > internal to that function.
How about this? This makes it not a direct index. But it do limit the search to max numa node on the system. static int domain_id_map[MAX_NUMNODES] = {[0 ... MAX_NUMNODES - 1] = -1 }; static int __affinity_domain_to_nid(int domain_id, int max_nid) { int i; for (i = 0; i < max_nid; i++) { if (domain_id_map[i] == domain_id) return i; } return NUMA_NO_NODE; } int affinity_domain_to_nid(struct affinity_domain *domain) { int nid, domain_id; static int last_nid = 0; static DEFINE_SPINLOCK(node_id_lock); domain_id = domain->id; /* * For PowerNV we don't change the node id. This helps to avoid * confusion w.r.t the expected node ids. On pseries, node numbers * are virtualized. Hence do logical node id for pseries. */ if (!firmware_has_feature(FW_FEATURE_LPAR)) return domain_id; if (domain_id == -1 || last_nid == MAX_NUMNODES) return NUMA_NO_NODE; nid = __affinity_domain_to_nid(domain_id, last_nid); if (nid == NUMA_NO_NODE) { spin_lock(&node_id_lock); /* recheck with lock held */ nid = __affinity_domain_to_nid(domain_id, last_nid); if (nid == NUMA_NO_NODE) { nid = last_nid++; domain_id_map[nid] = domain_id; } spin_unlock(&node_id_lock); } return nid; }