On 9/2/2024 3:08 AM, Varghese, Vipin wrote:
<Snipped>
Thank you Antaloy for the response. Let me try to share my understanding.
I recently looked into how Intel's Sub-NUMA Clustering would work within
DPDK, and found that I actually didn't have to do anything, because the
SNC "clusters" present themselves as NUMA nodes, which DPDK already
supports natively.
yes, this is correct. In Intel Xeon Platinum BIOS one can enable
`Cluster per NUMA` as `1,2 or4`.
This divides the tiles into Sub-Numa parition, each having separate
lcores,memory controllers, PCIe
and accelerator.
Does AMD's implementation of chiplets not report themselves as separate
NUMA nodes?
In AMD EPYC Soc, this is different. There are 2 BIOS settings, namely
1. NPS: `Numa Per Socket` which allows the IO tile (memory, PCIe and
Accelerator) to be partitioned as Numa 0, 1, 2 or 4.
2. L3 as NUMA: `L3 cache of CPU tiles as individual NUMA`. This allows
all CPU tiles to be independent NUMA cores.
The above settings are possible because CPU is independent from IO tile.
Thus allowing 4 combinations be available for use.
Sure, but presumably if the user wants to distinguish this, they have to
configure their system appropriately. If user wants to take advantage of
L3 as NUMA (which is what your patch proposes), then they can enable the
BIOS knob and get that functionality for free. DPDK already supports this.
These are covered in the tuning gudie for the SoC in 12. How to get best
performance on AMD platform — Data Plane Development Kit 24.07.0
documentation (dpdk.org)
<https://doc.dpdk.org/guides/linux_gsg/amd_platform.html>.
Because if it does, I don't really think any changes are
required because NUMA nodes would give you the same thing, would it not?
I have a different opinion to this outlook. An end user can
1. Identify the lcores and it's NUMA user `usertools/cpu-layout.py`
I recently submitted an enhacement for CPU layout script to print out
NUMA separately from physical socket [1].
[1]
https://patches.dpdk.org/project/dpdk/patch/40cf4ee32f15952457ac5526cfce64728bd13d32.1724323106.git.anatoly.bura...@intel.com/
I believe when "L3 as NUMA" is enabled in BIOS, the script will display
both physical package ID as well as NUMA nodes reported by the system,
which will be different from physical package ID, and which will display
information you were looking for.
2. But it is core mask in eal arguments which makes the threads
available to be used in a process.
See above: if the OS already reports NUMA information, this is not a
problem to be solved, CPU layout script can give this information to the
user.
3. there are no API which distinguish L3 numa domain. Function
`rte_socket_id
<https://doc.dpdk.org/api/rte__lcore_8h.html#a7c8da4664df26a64cf05dc508a4f26df>` for CPU tiles like AMD SoC will return physical socket.
Sure, but I would think the answer to that would be to introduce an API
to distinguish between NUMA (socket ID in DPDK parlance) and package
(physical socket ID in the "traditional NUMA" sense). Once we can
distinguish between those, DPDK can just rely on NUMA information
provided by the OS, while still being capable of identifying physical
sockets if the user so desires.
I am actually going to introduce API to get *physical socket* (as
opposed to NUMA node) in the next few days.
Example: In AMD EPYC Genoa, there are total of 13 tiles. 12 CPU tiles
and 1 IO tile. Setting
1. NPS to 4 will divide the memory, PCIe and accelerator into 4 domain.
While the all CPU will appear as single NUMA but each 12 tile having
independent L3 caches.
2. Setting `L3 as NUMA` allows each tile to appear as separate L3 clusters.
Hence, adding an API which allows to select available lcores based on
Split L3 is essential irrespective of the BIOS setting.
I think the crucial issue here is the "irrespective of BIOS setting"
bit. If EAL is getting into the game of figuring out exact intricacies
of physical layout of the system, then there's a lot more work to be
done as there are lots of different topologies, as other people have
already commented, and such an API needs *a lot* of thought put into it.
If, on the other hand, we leave this issue to the kernel, and only
gather NUMA information provided by the kernel, then nothing has to be
done - DPDK already supports all of this natively, provided the user has
configured the system correctly.
Moreover, arguably DPDK already works that way: technically you can get
physical socket information even absent of NUMA support in BIOS, but
DPDK does not do that. Instead, if OS reports NUMA node as 0, that's
what we're going with (even if we could detect multiple sockets from
sysfs), and IMO it should stay that way unless there is a strong
argument otherwise. We force the user to configure their system
correctly as it is, and I see no reason to second-guess user's BIOS
configuration otherwise.
--
Thanks,
Anatoly