On Wed, Mar 19, 2025 at 5:55 PM Bruce Richardson <bruce.richard...@intel.com> wrote: > > On Wed, Mar 19, 2025 at 05:31:45PM +0100, David Marchand wrote: > > On Wed, Mar 5, 2025 at 5:25 PM Bruce Richardson > > <bruce.richard...@intel.com> wrote: > > > > > > In cases where the number of cores on a given socket is greater than > > > RTE_MAX_LCORES, then EAL will be unaware of all the sockets/numa nodes > > > on a system. Fix this limitation by having the EAL probe the NUMA node > > > for cores it isn't going to use, and recording that for completeness. > > > > > > This is necessary as memory is tracked per node, and with the --lcores > > > parameters our app lcores may be on different sockets than the lcore ids > > > may imply. For example, lcore 0 is on socket zero, but if app is run > > > with --lcores=0@64, then DPDK lcore 0 may be on socket one, so DPDK > > > needs to be aware of that socket. > > > > > > Fixes: 952b20777255 ("eal: provide API for querying valid socket ids") > > > Cc: sta...@dpdk.org > > > > > > Signed-off-by: Bruce Richardson <bruce.richard...@intel.com> > > > > On the principle, the fix lgtm. > > > > I have one comment. > > > > > > > > --- > > > v2: handle case where RTE_MAX_LCORE > CPU_SETSIZE (i.e. >1024) > > > --- > > > lib/eal/common/eal_common_lcore.c | 17 ++++++++++++----- > > > 1 file changed, 12 insertions(+), 5 deletions(-) > > > > > > diff --git a/lib/eal/common/eal_common_lcore.c > > > b/lib/eal/common/eal_common_lcore.c > > > index 2ff9252c52..820a6534b1 100644 > > > --- a/lib/eal/common/eal_common_lcore.c > > > +++ b/lib/eal/common/eal_common_lcore.c > > > @@ -144,7 +144,11 @@ rte_eal_cpu_init(void) > > > unsigned lcore_id; > > > unsigned count = 0; > > > unsigned int socket_id, prev_socket_id; > > > - int lcore_to_socket_id[RTE_MAX_LCORE]; > > > +#if CPU_SETSIZE > RTE_MAX_LCORE > > > + int lcore_to_socket_id[CPU_SETSIZE] = {0}; > > > +#else > > > + int lcore_to_socket_id[RTE_MAX_LCORE] = {0}; > > > +#endif > > > > This initialisation was unneeded so far because, in the next loop (on > > each possible lcore), eal_cpu_socket_id() (returning 0 even for > > errors) was called regardless of eal_cpu_detected(). > > Moving this call after eal_cpu_detected() would be consistent with the > > rest of this patch. > > > > So keep the zero-init, and move the function call to set the initial values > in the array then?
I see no elegant way with current code. I would completely separate this socket discovery from the rest... Anyway, this is not the subject of this fix, so I'll withdraw this comment. > > > > > It is unrelated to this patch itself, but I also have some doubt about > > the socket_id value stored per lcore, as no check against > > RTE_MAX_NUMA_NODES is done afterwards. > > (it is probably never hit since the default value for RTE_MAX_NUMA_NODES is > > 32). > > > > Well, it's an open question whether RTE_MAX_NUMA_NODES is the max value for a > node id, or the maximum number of ids which can be handled. I imagine most > of the code assumes both - that we have sequential numa nodes with value < > MAX. Regardless of the meaning, we can end up in a situation where a lcore has a socket_id set in lcore_config[] / rte_lcore_XX API, that is outside the list of numa nodes stored in config->numa_nodes[] / rte_socket_XX API, which is used for memory init for example. -- David Marchand