RE: [PATCH v4 0/4] Introduce Topology NUMA grouping for lcores

Varghese, Vipin Wed, 09 Apr 2025 03:19:17 -0700

[AMD Official Use Only - AMD Internal Distribution Only]

Snipped


>
> Hello Vipin and others,
>
> please, will there be any progress or update on this series?

Apologies, we did a small update in slack, and missed this out here. Let me try 
to address your questions below

>
> I successfully tested those changes on our Intel and AMD machines and would 
> like
> to use it in production soon.
>
> The API is a little bit unintuitive, at least for me, but I successfully 
> integrated into our
> software.
>
> I am missing a clear relation to the NUMA socket approach used in DPDK.
> E.g. I would like to be able to easily walk over a list of lcores from a 
> specific NUMA
> node grouped by L3 domain. Yes, there is the RTE_LCORE_DOMAIN_IO, but would
> it always match the appropriate socket IDs?

Yes, we from AMD were internally debating the same. But since there is an API 
in lcore API as ` rte_lcore_to_socket_id`, adding yet another variation or 
argument lack it luster.
Hence we internally debating when using the new API why not check if it is 
desired Physical Socket or Sub Socket Numa domain?

Hence, we did not add the option.

>
> Also, I do not clearly understand what is the purpose of using domain 
> selector like:
>
>   RTE_LCORE_DOMAIN_L1 | RTE_LCORE_DOMAIN_L2
>
> or even:
>
>   RTE_LCORE_DOMAIN_L3 | RTE_LCORE_DOMAIN_L2

I believe we have mentioned in documents to choose 1, if used multiple combo 
based on the code flow only 1 will be picked up.

real use of these are to select physical cores, under same cache or io domain.
Example: certain SoC has 4 cores sharing L2, which makes pipeline processing 
more convinent (less data movement). In such cases select lcores within same L2 
topologoly.

>
> the documentation does not explain this. I could not spot any kind of 
> grouping that
> would help me in any way. Some "best practices" examples would be nice to 
> have to
> understand the intentions better.

>From 
>https://patches.dpdk.org/project/dpdk/cover/20241105102849.1947-1-vipin.vargh...@amd.com/

```
Reason:
 - Applications using DPDK libraries relies on consistent memory access.
 - Lcores being closer to same NUMA domain as IO.
 - Lcores sharing same cache.

Latency is minimized by using lcores that share the same NUMA topology.
Memory access is optimized by utilizing cores within the same NUMA
domain or tile. Cache coherence is preserved within the same shared cache
domain, reducing the remote access from tile|compute package via snooping
(local hit in either L2 or L3 within same NUMA domain).
```

>
> I found a little catch when running DPDK with more lcores than there are 
> physical or
> SMT CPU cores. This happens when using e.g. an option like 
> --lcores=(0-15)@(0-1).
> The results from the topology API would not match the lcores because hwloc is 
> not
> aware of the lcores concept. This might be mentioned somewhere.

Yes, this is expected. As one can map any cpu cores to dpdk lcore with 
`lcore-map`.
We did mentioned this in RFCv4, but when upgraded to RFCv5 we missed to mention 
it back.

>
> Anyway, I really appreciate this work and would like to see it upstream.
> Especially for AMD machines, some framework like this is a must.
>
> Kind regards,
> Jan
>

We are planning to remove RFC tag and share the final version for upcoming 
release for DPDK shortly.

RE: [PATCH v4 0/4] Introduce Topology NUMA grouping for lcores

Reply via email to