Hi David I am not sure it works either, if the lcore are manual set with a gap: `--lcores=0,7` (from `eal_parse_lcores`): - lcore 0 will get core_index = 0 - lcore 7 will get core_index = 1
When calling `rte_thread_register` we will hit lcore=1 as first not-assigned lcore and set core_index=1 as well. It seems like a solution should be to have a bitmap of the currently used core_index stored in the global config. Please let me know what you think about that. Maxime Peim On Mon, Jun 8, 2026 at 6:35 PM David Marchand <[email protected]> wrote: > On Mon, 8 Jun 2026 at 18:10, David Marchand <[email protected]> > wrote: > > > > On Wed, 22 Apr 2026 at 09:54, Maxime Peim <[email protected]> wrote: > > > > > > Threads registered via rte_thread_register() are assigned a valid > > > lcore_id by eal_lcore_non_eal_allocate(), but their core_index in > > > lcore_config is left at -1. This value was set during > rte_eal_cpu_init() > > > for lcores with ROLE_OFF (undetected CPUs) and is never updated when > the > > > lcore is later allocated to a non-EAL thread. > > > > > > As a result, rte_lcore_index() returns -1 for registered non-EAL > > > threads. Libraries that use rte_lcore_index() to select per-lcore > > > caches fall back to a shared global path when it returns -1, causing > > > severe contention under concurrent access from multiple registered > > > threads. > > > > > > A concrete example is the mlx5 indexed memory pool (mlx5_ipool), which > > > uses rte_lcore_index() in mlx5_ipool_malloc_cache() to select a > per-core > > > cache slot. When core_index is -1, all registered threads are funneled > > > into a single shared slot protected by a spinlock. In testing with VPP > > > (which registers worker threads via rte_thread_register()), this caused > > > async flow rule insertion throughput to drop from ~6.4M rules/sec to > > > ~1.2M rules/sec with 4 workers -- a 5x regression attributable entirely > > > to spinlock contention in the ipool allocator. > > > > > > Fix by setting core_index to the next sequential index > (cfg->lcore_count) > > > in eal_lcore_non_eal_allocate() before incrementing the count. Also > reset > > > core_index back to -1 on the error rollback path and in > > > eal_lcore_non_eal_release() for correctness. > > > > > > Fixes: 5c307ba2a5b1 ("eal: register non-EAL threads as lcores") > > Cc: [email protected] > > > > > Signed-off-by: Maxime Peim <[email protected]> > > Acked-by: David Marchand <[email protected]> > > > > Hum, I did not push the change. > Re-reading this code, we have an issue if some external thread > unregisters in the middle. > > What do you think of the additional hunk: > > $ git diff > diff --git a/lib/eal/common/eal_common_lcore.c > b/lib/eal/common/eal_common_lcore.c > index ae085d73e4..6f53f20d90 100644 > --- a/lib/eal/common/eal_common_lcore.c > +++ b/lib/eal/common/eal_common_lcore.c > @@ -372,13 +372,16 @@ eal_lcore_non_eal_allocate(void) > struct rte_config *cfg = rte_eal_get_configuration(); > struct lcore_callback *callback; > struct lcore_callback *prev; > + unsigned int index = 0; > unsigned int lcore_id; > > rte_rwlock_write_lock(&lcore_lock); > for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) { > - if (cfg->lcore_role[lcore_id] != ROLE_OFF) > + if (cfg->lcore_role[lcore_id] != ROLE_OFF) { > + index++; > continue; > - lcore_config[lcore_id].core_index = cfg->lcore_count; > + } > + lcore_config[lcore_id].core_index = index; > cfg->lcore_role[lcore_id] = ROLE_NON_EAL; > cfg->lcore_count++; > break; > > > -- > David Marchand > >

