[Public] Snipped
> >>>> > >>>> <snipped> > >>>> > >>>>>> <snipped> > >>>>>> > >>>>>> Thank you Mattias for the comments and question, please let me > >>>>>> try to explain the same below > >>>>>> > >>>>>>> We shouldn't have a separate CPU/cache hierarchy API instead? > >>>>>> > >>>>>> Based on the intention to bring in CPU lcores which share same L3 > >>>>>> (for better cache hits and less noisy neighbor) current API > >>>>>> focuses on using > >>>>>> > >>>>>> Last Level Cache. But if the suggestion is `there are SoC where > >>>>>> L2 cache are also shared, and the new API should be provisioned`, > >>>>>> I am also > >>>>>> > >>>>>> comfortable with the thought. > >>>>>> > >>>>> > >>>>> Rather than some AMD special case API hacked into <rte_lcore.h>, I > >>>>> think we are better off with no DPDK API at all for this kind of > functionality. > >>>> > >>>> Hi Mattias, as shared in the earlier email thread, this is not a > >>>> AMD special > >>> case at all. Let me try to explain this one more time. One of > >>> techniques used to increase cores cost effective way to go for tiles of > compute complexes. > >>>> This introduces a bunch of cores in sharing same Last Level Cache > >>>> (namely > >>> L2, L3 or even L4) depending upon cache topology architecture. > >>>> > >>>> The API suggested in RFC is to help end users to selectively use > >>>> cores under > >>> same Last Level Cache Hierarchy as advertised by OS (irrespective of > >>> the BIOS settings used). This is useful in both bare-metal and container > environment. > >>>> > >>> > >>> I'm pretty familiar with AMD CPUs and the use of tiles (including > >>> the challenges these kinds of non-uniformities pose for work scheduling). > >>> > >>> To maximize performance, caring about core<->LLC relationship may > >>> well not be enough, and more HT/core/cache/memory topology > >>> information is required. That's what I meant by special case. A > >>> proper API should allow access to information about which lcores are > >>> SMT siblings, cores on the same L2, and cores on the same L3, to > >>> name a few things. Probably you want to fit NUMA into the same API > >>> as well, although that is available already in <rte_lcore.h>. > >> Thank you Mattias for the information, as shared by in the reply with > Anatoly we want expose a new API `rte_get_next_lcore_ex` which intakes a > extra argument `u32 flags`. > >> The flags can be RTE_GET_LCORE_L1 (SMT), RTE_GET_LCORE_L2, > RTE_GET_LCORE_L3, RTE_GET_LCORE_BOOST_ENABLED, > RTE_GET_LCORE_BOOST_DISABLED. > > > > Wouldn't using that API be pretty awkward to use? Current API available under DPDK is ` rte_get_next_lcore`, which is used within DPDK example and in customer solution. Based on the comments from others we responded to the idea of changing the new Api from ` rte_get_next_lcore_llc` to ` rte_get_next_lcore_exntd`. Can you please help us understand what is `awkward`. > > > > I mean, what you have is a topology, with nodes of different types and with > different properties, and you want to present it to the user. Let me be clear, what we want via DPDK to help customer to use an Unified API which works across multiple platforms. Example - let a vendor have 2 products namely A and B. CPU-A has all cores within same SUB-NUMA domain and CPU-B has cores split to 2 sub-NUMA domain based on split LLC. When `rte_get_next_lcore_extnd` is invoked for `LLC` on 1. CPU-A: it returns all cores as there is no split 2. CPU-B: it returns cores from specific sub-NUMA which is partitioned by L3 > > > > In a sense, it's similar to XCM and DOM versus SAX. The above is SAX-style, > and what I have in mind is something DOM-like. > > > > What use case do you have in mind? What's on top of my list is a scenario > where a DPDK app gets a bunch of cores (e.g., -l <cores>) and tries to figure > out how best make use of them. Exactly. It's not going to "skip" (ignore, leave unused) > SMT siblings, or skip non-boosted cores, it would just try to be clever in > regards to which cores to use for what purpose. Let me try to share my idea on SMT sibling. When user invoked for rte_get_next_lcore_extnd` is invoked for `L1 | SMT` flag with `lcore`; the API identifies first whether given lcore is part of enabled core list. If yes, it programmatically either using `sysfs` or `hwloc library (shared the version concern on distros. Will recheck again)` identify the sibling thread and return. If there is no sibling thread available under DPDK it will fetch next lcore (probably lcore +1 ). > > > >> This is AMD EPYC SoC agnostic and trying to address for all generic cases. > >> Please do let us know if we (Ferruh & myself) can sync up via call? > > > > Sure, I can do that. Let me sync with Ferruh and get a time slot for internal sync. > > > Can this be opened to the rest of the community? This is a common problem > that needs to be solved for multiple architectures. I would be interested in > attending. Thank you Mattias, in DPDK Bangkok summit 2024 we did bring this up. As per the suggestion from Thomas and Jerrin we tried to bring the RFC for discussion. For DPDK Montreal 2024, Keesang and Ferruh (most likely) is travelling for the summit and presenting this as the talk to get things moving. > > >>> <snipped>