[AMD Official Use Only - AMD Internal Distribution Only] Snipped
> > Hello Vipin and others, > > please, will there be any progress or update on this series? Apologies, we did a small update in slack, and missed this out here. Let me try to address your questions below > > I successfully tested those changes on our Intel and AMD machines and would > like > to use it in production soon. > > The API is a little bit unintuitive, at least for me, but I successfully > integrated into our > software. > > I am missing a clear relation to the NUMA socket approach used in DPDK. > E.g. I would like to be able to easily walk over a list of lcores from a > specific NUMA > node grouped by L3 domain. Yes, there is the RTE_LCORE_DOMAIN_IO, but would > it always match the appropriate socket IDs? Yes, we from AMD were internally debating the same. But since there is an API in lcore API as ` rte_lcore_to_socket_id`, adding yet another variation or argument lack it luster. Hence we internally debating when using the new API why not check if it is desired Physical Socket or Sub Socket Numa domain? Hence, we did not add the option. > > Also, I do not clearly understand what is the purpose of using domain > selector like: > > RTE_LCORE_DOMAIN_L1 | RTE_LCORE_DOMAIN_L2 > > or even: > > RTE_LCORE_DOMAIN_L3 | RTE_LCORE_DOMAIN_L2 I believe we have mentioned in documents to choose 1, if used multiple combo based on the code flow only 1 will be picked up. real use of these are to select physical cores, under same cache or io domain. Example: certain SoC has 4 cores sharing L2, which makes pipeline processing more convinent (less data movement). In such cases select lcores within same L2 topologoly. > > the documentation does not explain this. I could not spot any kind of > grouping that > would help me in any way. Some "best practices" examples would be nice to > have to > understand the intentions better. >From >https://patches.dpdk.org/project/dpdk/cover/20241105102849.1947-1-vipin.vargh...@amd.com/ ``` Reason: - Applications using DPDK libraries relies on consistent memory access. - Lcores being closer to same NUMA domain as IO. - Lcores sharing same cache. Latency is minimized by using lcores that share the same NUMA topology. Memory access is optimized by utilizing cores within the same NUMA domain or tile. Cache coherence is preserved within the same shared cache domain, reducing the remote access from tile|compute package via snooping (local hit in either L2 or L3 within same NUMA domain). ``` > > I found a little catch when running DPDK with more lcores than there are > physical or > SMT CPU cores. This happens when using e.g. an option like > --lcores=(0-15)@(0-1). > The results from the topology API would not match the lcores because hwloc is > not > aware of the lcores concept. This might be mentioned somewhere. Yes, this is expected. As one can map any cpu cores to dpdk lcore with `lcore-map`. We did mentioned this in RFCv4, but when upgraded to RFCv5 we missed to mention it back. > > Anyway, I really appreciate this work and would like to see it upstream. > Especially for AMD machines, some framework like this is a must. > > Kind regards, > Jan > We are planning to remove RFC tag and share the final version for upcoming release for DPDK shortly.