On 6/24/21 4:03 PM, Laurent Dufour wrote:
Hi Aneesh,

A little bit of wordsmithing below...

Le 17/06/2021 à 18:51, Aneesh Kumar K.V a écrit :
PAPR interface currently supports two different ways of communicating resource
grouping details to the OS. These are referred to as Form 0 and Form 1
associativity grouping. Form 0 is the older format and is now considered
deprecated. This patch adds another resource grouping named FORM2.

Signed-off-by: Daniel Henrique Barboza <danielhb...@gmail.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.ku...@linux.ibm.com>
---
  Documentation/powerpc/associativity.rst   | 135 ++++++++++++++++++++
  arch/powerpc/include/asm/firmware.h       |   3 +-
  arch/powerpc/include/asm/prom.h           |   1 +
  arch/powerpc/kernel/prom_init.c           |   3 +-
  arch/powerpc/mm/numa.c                    | 149 +++++++++++++++++++++-
  arch/powerpc/platforms/pseries/firmware.c |   1 +
  6 files changed, 286 insertions(+), 6 deletions(-)
  create mode 100644 Documentation/powerpc/associativity.rst

diff --git a/Documentation/powerpc/associativity.rst b/Documentation/powerpc/associativity.rst
new file mode 100644
index 000000000000..93be604ac54d
--- /dev/null
+++ b/Documentation/powerpc/associativity.rst
@@ -0,0 +1,135 @@
+============================
+NUMA resource associativity
+=============================
+
+Associativity represents the groupings of the various platform resources into +domains of substantially similar mean performance relative to resources outside
+of that domain. Resources subsets of a given domain that exhibit better
+performance relative to each other than relative to other resources subsets +are represented as being members of a sub-grouping domain. This performance +characteristic is presented in terms of NUMA node distance within the Linux kernel.
+From the platform view, these groups are also referred to as domains.
+
+PAPR interface currently supports different ways of communicating these resource +grouping details to the OS. These are referred to as Form 0, Form 1 and Form2 +associativity grouping. Form 0 is the older format and is now considered deprecated.
+
+Hypervisor indicates the type/form of associativity used via "ibm,arcitecture-vec-5 property".
                                                            architecture ^


fixed

+Bit 0 of byte 5 in the "ibm,architecture-vec-5" property indicates usage of Form 0 or Form 1. +A value of 1 indicates the usage of Form 1 associativity. For Form 2 associativity
+bit 2 of byte 5 in the "ibm,architecture-vec-5" property is used.
+
+Form 0
+-----
+Form 0 associativity supports only two NUMA distance (LOCAL and REMOTE).
+
+Form 1
+-----
+With Form 1 a combination of ibm,associativity-reference-points and ibm,associativity +device tree properties are used to determine the NUMA distance between resource groups/domains.
+
+The “ibm,associativity” property contains one or more lists of numbers (domainID)
+representing the resource’s platform grouping domains.
+
+The “ibm,associativity-reference-points” property contains one or more list of numbers +(domainID index) that represents the 1 based ordinal in the associativity lists. +The list of domainID index represnets increasing hierachy of resource grouping.
                         represents ^


fixed

+
+ex:
+{ primary domainID index, secondary domainID index, tertiary domainID index.. }
+
+Linux kernel uses the domainID at the primary domainID index as the NUMA node id. +Linux kernel computes NUMA distance between two domains by recursively comparing +if they belong to the same higher-level domains. For mismatch at every higher +level of the resource group, the kernel doubles the NUMA distance between the
+comparing domains.
+
+Form 2
+-------
+Form 2 associativity format adds separate device tree properties representing NUMA node distance +thereby making the node distance computation flexible. Form 2 also allows flexible primary +domain numbering. With numa distance computation now detached from the index value of +"ibm,associativity" property, Form 2 allows a large number of primary domain ids at the +same domainID index representing resource groups of different performance/latency characteristics.
+
+Hypervisor indicates the usage of FORM2 associativity using bit 2 of byte 5 in the
+"ibm,architecture-vec-5" property.
+
+"ibm,numa-lookup-index-table" property contains one or more list numbers representing +the domainIDs present in the system. The offset of the domainID in this property is considered
+the domainID index.
+
+prop-encoded-array: The number N of the domainIDs encoded as with encode-int, followed by
+N domainID encoded as with encode-int
+
+For ex:
+ibm,numa-lookup-index-table =  {4, 0, 8, 250, 252}, domainID index for domainID 8 is 1.
+
+"ibm,numa-distance-table" property contains one or more list of numbers representing the NUMA
+distance between resource groups/domains present in the system.
+
+prop-encoded-array: The number N of the distance values encoded as with encode-int, followed by +N distance values encoded as with encode-bytes. The max distance value we could encode is 255.
+
+For ex:
+ibm,numa-lookup-index-table =  {3, 0, 8, 40}
+ibm,numa-distance-table     =  {9, 10, 20, 80, 20, 10, 160, 80, 160, 10}
+
+  | 0    8   40
+--|------------
+  |
+0 | 10   20  80
+  |
+8 | 20   10  160
+  |
+40| 80   160  10
+
+
+"ibm,associativity" property for resources in node 0, 8 and 40
+
+{ 3, 6, 7, 0 }
+{ 3, 6, 9, 8 }
+{ 3, 6, 7, 40}
+
+With "ibm,associativity-reference-points"  { 0x3 }
+
+Each resource (drcIndex) now also supports additional optional device tree properties. +These properties are marked optional because the platform can choose not to export +them and provide the system topology details using the earlier defined device tree +properties alone. The optional device tree properties are used when adding new resources +(DLPAR) and when the platform didn't provide the topology details of the domain which
+contains the newly added resource during boot.
+
+"ibm,numa-lookup-index" property contains a number representing the domainID index to be used +when building the NUMA distance of the numa node to which this resource belongs. This can +be looked at as the index at which this new domainID would have appeared in +"ibm,numa-lookup-index-table" if the domain was present during boot. The domainID +of the new resource can be obtained from the existing "ibm,associativity" property. This +can be used to build distance information of a newly onlined NUMA node via DLPAR operation.
+The value is 1 based array index value.
+
+prop-encoded-array: An integer encoded as with encode-int specifying the domainID index
+
+"ibm,numa-distance" property contains one or more list of numbers presenting the NUMA distance
+from this resource domain to other resources.
+
+prop-encoded-array: The number N of the distance values encoded as with encode-int, followed by +N distance values encoded as with encode-bytes. The max distance value we could encode is 255.
+
+For ex:
+ibm,associativity     = { 4, 5, 10, 50}

Is missing the first byte of the property (length) or an associativity number?


that should be {3, 5,10,50}  fixed.

+ibm,numa-lookup-index = { 4 }
+ibm,numa-distance   =  {8, 160, 255, 80, 10, 160, 255, 80, 10}
+
+resulting in a new toplogy as below.
+  | 0    8   40   50
+--|------------------
+  |
+0 | 10   20  80   160
+  |
+8 | 20   10  160  255
+  |
+40| 80   160  10  80
+  |
+50| 160  255  80  10
+
diff --git a/arch/powerpc/include/asm/firmware.h b/arch/powerpc/include/asm/firmware.h
index 60b631161360..97a3bd9ffeb9 100644
--- a/arch/powerpc/include/asm/firmware.h
+++ b/arch/powerpc/include/asm/firmware.h


...

+    numa_distancep = of_get_property(node, "ibm,numa-distance", NULL);
+    if (!numa_distancep)
+        return;
+
+    numa_indexp = of_get_property(node, "ibm,numa-lookup-index", NULL);
+    if (!numa_indexp)
+        return;
+
+    numa_index = of_read_number(numa_indexp, 1);
+    /*
+     * update the numa_id_index_table. Device tree look at index table as
+     * 1 based array indexing.
+     */
+    numa_id_index_table[numa_index - 1] = nid;
+
+    max_numa_index = of_read_number((const __be32 *)numa_distancep, 1);
+    VM_WARN_ON(max_numa_index != 2 * numa_index);

Could you explain shortly in a comment the meaning of this VM_WARN_ON check?


Based on the other review feedback this is dropped. We now derive domain distance offset based on the number of elements in "ibm,numa-distance"

+    /* Skip the size which is encoded int */
+    numa_distancep += sizeof(__be32);
+
+    /*
+     * First fill the distance information from other node to this node.
+     */
+    other_nid_index = 0;
+    for (i = 0; i < numa_index; i++) {
+        numa_distance = numa_distancep[i];
+        other_nid = numa_id_index_table[other_nid_index++];
+        numa_distance_table[other_nid][nid] = numa_distance;
+    }
+
+    other_nid_index = 0;
+    for (; i < max_numa_index; i++) {
+        numa_distance = numa_distancep[i];
+        other_nid = numa_id_index_table[other_nid_index++];
+        numa_distance_table[nid][other_nid] = numa_distance;
+    }
+}
+

Thanks for reviewing the patch.

-aneesh

Reply via email to