Hi David

I am not sure it works either, if the lcore are manual set with a gap:
`--lcores=0,7` (from `eal_parse_lcores`):
- lcore 0 will get core_index = 0
- lcore 7 will get core_index = 1

When calling `rte_thread_register` we will hit lcore=1 as first
not-assigned lcore and set core_index=1 as well.

It seems like a solution should be to have a bitmap of the currently used
core_index stored in the global config.

Please let me know what you think about that.

Maxime Peim

On Mon, Jun 8, 2026 at 6:35 PM David Marchand <[email protected]>
wrote:

> On Mon, 8 Jun 2026 at 18:10, David Marchand <[email protected]>
> wrote:
> >
> > On Wed, 22 Apr 2026 at 09:54, Maxime Peim <[email protected]> wrote:
> > >
> > > Threads registered via rte_thread_register() are assigned a valid
> > > lcore_id by eal_lcore_non_eal_allocate(), but their core_index in
> > > lcore_config is left at -1. This value was set during
> rte_eal_cpu_init()
> > > for lcores with ROLE_OFF (undetected CPUs) and is never updated when
> the
> > > lcore is later allocated to a non-EAL thread.
> > >
> > > As a result, rte_lcore_index() returns -1 for registered non-EAL
> > > threads. Libraries that use rte_lcore_index() to select per-lcore
> > > caches fall back to a shared global path when it returns -1, causing
> > > severe contention under concurrent access from multiple registered
> > > threads.
> > >
> > > A concrete example is the mlx5 indexed memory pool (mlx5_ipool), which
> > > uses rte_lcore_index() in mlx5_ipool_malloc_cache() to select a
> per-core
> > > cache slot. When core_index is -1, all registered threads are funneled
> > > into a single shared slot protected by a spinlock. In testing with VPP
> > > (which registers worker threads via rte_thread_register()), this caused
> > > async flow rule insertion throughput to drop from ~6.4M rules/sec to
> > > ~1.2M rules/sec with 4 workers -- a 5x regression attributable entirely
> > > to spinlock contention in the ipool allocator.
> > >
> > > Fix by setting core_index to the next sequential index
> (cfg->lcore_count)
> > > in eal_lcore_non_eal_allocate() before incrementing the count. Also
> reset
> > > core_index back to -1 on the error rollback path and in
> > > eal_lcore_non_eal_release() for correctness.
> > >
> > > Fixes: 5c307ba2a5b1 ("eal: register non-EAL threads as lcores")
> > Cc: [email protected]
> >
> > > Signed-off-by: Maxime Peim <[email protected]>
> > Acked-by: David Marchand <[email protected]>
> >
>
> Hum, I did not push the change.
> Re-reading this code, we have an issue if some external thread
> unregisters in the middle.
>
> What do you think of the additional hunk:
>
> $ git diff
> diff --git a/lib/eal/common/eal_common_lcore.c
> b/lib/eal/common/eal_common_lcore.c
> index ae085d73e4..6f53f20d90 100644
> --- a/lib/eal/common/eal_common_lcore.c
> +++ b/lib/eal/common/eal_common_lcore.c
> @@ -372,13 +372,16 @@ eal_lcore_non_eal_allocate(void)
>         struct rte_config *cfg = rte_eal_get_configuration();
>         struct lcore_callback *callback;
>         struct lcore_callback *prev;
> +       unsigned int index = 0;
>         unsigned int lcore_id;
>
>         rte_rwlock_write_lock(&lcore_lock);
>         for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
> -               if (cfg->lcore_role[lcore_id] != ROLE_OFF)
> +               if (cfg->lcore_role[lcore_id] != ROLE_OFF) {
> +                       index++;
>                         continue;
> -               lcore_config[lcore_id].core_index = cfg->lcore_count;
> +               }
> +               lcore_config[lcore_id].core_index = index;
>                 cfg->lcore_role[lcore_id] = ROLE_NON_EAL;
>                 cfg->lcore_count++;
>                 break;
>
>
> --
> David Marchand
>
>

Reply via email to