On Wed, 6 Sep 2023 19:54:32 +0200 Morten Brørup <m...@smartsharesystems.com> wrote:
> > > > - idx = rte_lcore_id(); > > + seed = __atomic_load_n(&rte_rand_seed, __ATOMIC_RELAXED); > > + if (unlikely(seed != rand_state->seed)) { > > Please note that rte_rand_seed lives in a completely different cache > line than RTE_PER_LCORE(rte_rand_state), so the comparison with > rte_rand_seed requires reading one more cache line than the original > implementation, which only uses the cache line holding > rand_states[idx]. > > This is in the hot path. > > If we could register a per-thread INIT function, the lazy > initialization could be avoided, and only one cache line accessed. Since rte_rand_seed rarely changes, it will get cached by each cpu. The problem was before there was prefetch cache overlap causing false sharing.