> From: Mattias Rönnblom [mailto:hof...@lysator.liu.se] > Sent: Saturday, 9 September 2023 08.45 > > On 2023-09-09 02:13, Konstantin Ananyev wrote: > > 06/09/2023 21:02, Mattias Rönnblom пишет: > >> On 2023-09-06 19:20, Stephen Hemminger wrote: > >>> Move the random number state into thread local storage. > >> > >> Me and Morten discussed TLS versus other alternatives in some other > >> thread. The downside of TLS that Morten pointed out, from what I > >> recall, is that lazy initialization is *required* (since the number > of > >> threads is open-ended), and the data ends up in non-huge page memory. > > > > Hmm.. correct me if I am wrong, but with current implementation, > > rand state is also in non-huge memory: > > static struct rte_rand_state rand_states[RTE_MAX_LCORE + 1]; > > > > Yes. The current pattern is certainly not perfect. > > > > >> It was also unclear to me what the memory footprint implications > would > >> be,h would large per-lcore data structures be put in TLS. More > >> specifically, if they would be duplicated across all threads, even > >> non-lcore threads. > >> > >> None of these issues affect rte_random.c's potential usage of TLS > >> (except lazy [re-]initialization makes things more complicated). > >> > >> Preferably, there should be one pattern that is usable across all or > >> at least most DPDK modules requiring per-lcore state. > >> > >>> This has a several benefits. > >>> - no false cache sharing from cpu prefetching > >>> - fixes initialization of random state for non-DPDK threads > >> > >> This seems like a non-reason to me. That bug is easily fixed, if it > >> isn't already. > >> > >>> - fixes unsafe usage of random state by non-DPDK threads > >>> > >> > >> "Makes random number generation MT safe from all threads (including > >> unregistered non-EAL threads)." > >> > >> With current API semantics you may still register an non-EAL thread, > >> to get MT safe access to this API, so I guess it's more about being > >> more convenient and less error prone, than anything else. > > > > I understand that we never guaranteed MT safety for non-EAL threads > here, > > > Registered non-EAL threads have a lcore id and thus may safely call > rte_rand(). Multiple unregistered non-EAL threads may not do so, in > parallel. > > > > but as a user of rte_rand() - it would be much more convenient, if I > can > > use it > > from any thread wthout worring is it a EAL thread or not. > > Sure, especially if it comes for free. The for-free solution has yet to > reveal itself though.
We could offer re-entrant function variants for non-EAL threads: uint64_t rte_rand_r(struct rte_rand_state * const state); void rte_srand_r(struct rte_rand_state * const state, uint64_t seed); uint64_t rte_rand_max_r(struct rte_rand_state * const state, uint64_t upper_bound); double rte_drand_r(struct rte_rand_state * const state, void); For this to work, we would have to make struct rte_rand_state public, and the application would need to allocate it. (At least one instance per thread that uses it, obviously.) > > > > > About TlS usage and re-seeding - can we use some sort of middle- > ground: > > extend rte_rand_state with some gen-counter. > > Make a 'master' copy of rte_rand_state that will be updated by > rte_srand(), > > and TLS copies of rte_rand_state, so rte_rand() can fist compare > > its gen-counter value with master copy to decide, > > does it need to copy new state from master or not. > > > > Calling threads shouldn't all produce the same sequence. That would be > silly and not very random. The generation number should be tied to the > seed. I previously thought about seeding... We are trying to be random, we are not explicitly pseudo-random. So I came to the conclusion that the ability to reproduce data (typically for verification purposes) is not a requirement here. > > > > >> The new MT safety guarantees should be in the API docs as well. > > > > Yes, it is an extension to the current API, not a fix. > > > >> > >>> The initialization of random number state is done by the > >>> lcore (lazy initialization). > >>> > >>> Signed-off-by: Stephen Hemminger <step...@networkplumber.org> > >>> --- > >>> lib/eal/common/rte_random.c | 38 +++++++++++++++++++-------------- > ---- > >>> 1 file changed, 20 insertions(+), 18 deletions(-) > >>> > >>> diff --git a/lib/eal/common/rte_random.c > b/lib/eal/common/rte_random.c > >>> index 53636331a27b..9657adf6ad3b 100644 > >>> --- a/lib/eal/common/rte_random.c > >>> +++ b/lib/eal/common/rte_random.c > >>> @@ -19,13 +19,14 @@ struct rte_rand_state { > >>> uint64_t z3; > >>> uint64_t z4; > >>> uint64_t z5; > >>> -} __rte_cache_aligned; > >>> + uint64_t seed; > >>> +}; > >>> -/* One instance each for every lcore id-equipped thread, and one > >>> - * additional instance to be shared by all others threads (i.e., > all > >>> - * unregistered non-EAL threads). > >>> - */ > >>> -static struct rte_rand_state rand_states[RTE_MAX_LCORE + 1]; > >>> +/* Global random seed */ > >>> +static uint64_t rte_rand_seed; > >>> + > >>> +/* Per lcore random state. */ > >>> +static RTE_DEFINE_PER_LCORE(struct rte_rand_state, rte_rand_state); > >>> static uint32_t > >>> __rte_rand_lcg32(uint32_t *seed) > >>> @@ -81,11 +82,7 @@ __rte_srand_lfsr258(uint64_t seed, struct > >>> rte_rand_state *state) > >>> void > >>> rte_srand(uint64_t seed) > >>> { > >>> - unsigned int lcore_id; > >>> - > >>> - /* add lcore_id to seed to avoid having the same sequence */ > >>> - for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) > >>> - __rte_srand_lfsr258(seed + lcore_id, > &rand_states[lcore_id]); > >>> + __atomic_store_n(&rte_rand_seed, seed, __ATOMIC_RELAXED); > >>> } > >>> static __rte_always_inline uint64_t > >>> @@ -119,15 +116,18 @@ __rte_rand_lfsr258(struct rte_rand_state > *state) > >>> static __rte_always_inline > >>> struct rte_rand_state *__rte_rand_get_state(void) > >>> { > >>> - unsigned int idx; > >>> + struct rte_rand_state *rand_state = > &RTE_PER_LCORE(rte_rand_state); > >> > >> There should really be a RTE_PER_THREAD, an alias to RTE_PER_LCORE, > to > >> cover this usage. Or just use __thread (or _Thread_local?). > >> > >>> + uint64_t seed; > >>> - idx = rte_lcore_id(); > >>> + seed = __atomic_load_n(&rte_rand_seed, __ATOMIC_RELAXED); > >>> + if (unlikely(seed != rand_state->seed)) { > >>> + rand_state->seed = seed; > >> > >> Re-seeding should restart the series, on all lcores. There's nothing > >> preventing the user from re-seeding the machinery repeatedly, with > the > >> same seed. Seems like an unusual, but still valid, use case, if you > >> run repeated tests of some sort. > >> > >> Use a seqlock? :) I guess you need a seed generation number as well > >> (e.g., is this the first time you seed with X, or the second one, > etc.) > >> > >>> - /* last instance reserved for unregistered non-EAL threads */ > >>> - if (unlikely(idx == LCORE_ID_ANY)) > >>> - idx = RTE_MAX_LCORE; > >>> + seed += rte_thread_self().opaque_id; > >>> + __rte_srand_lfsr258(seed, rand_state); > >>> + } > >>> - return &rand_states[idx]; > >>> + return rand_state; > >>> } > >>> uint64_t > >>> @@ -227,7 +227,9 @@ RTE_INIT(rte_rand_init) > >>> { > >>> uint64_t seed; > >>> - seed = __rte_random_initial_seed(); > >>> + do > >>> + seed = __rte_random_initial_seed(); > >>> + while (seed == 0); > >> > >> Might be worth a comment why seed 0 is not allowed. Alternatively, > use > >> some other way of signaling __rte_srand_lfsr258() must be called. > >> > >>> rte_srand(seed); > >>> } > >