RE: [RFC] random: use per lcore state

Morten Brørup Sat, 09 Sep 2023 04:23:17 -0700

> From: Mattias Rönnblom [mailto:hof...@lysator.liu.se]
> Sent: Saturday, 9 September 2023 08.45
> 
> On 2023-09-09 02:13, Konstantin Ananyev wrote:
> > 06/09/2023 21:02, Mattias Rönnblom пишет:
> >> On 2023-09-06 19:20, Stephen Hemminger wrote:
> >>> Move the random number state into thread local storage.
> >>
> >> Me and Morten discussed TLS versus other alternatives in some other
> >> thread. The downside of TLS that Morten pointed out, from what I
> >> recall, is that lazy initialization is *required* (since the number
> of
> >> threads is open-ended), and the data ends up in non-huge page memory.
> >
> > Hmm.. correct me if I am wrong, but with current implementation,
> > rand state is also in non-huge memory:
> > static struct rte_rand_state rand_states[RTE_MAX_LCORE + 1];
> >
> 
> Yes. The current pattern is certainly not perfect.
> 
> >
> >> It was also unclear to me what the memory footprint implications
> would
> >> be,h would large per-lcore data structures be put in TLS. More
> >> specifically, if they would be duplicated across all threads, even
> >> non-lcore threads.
> >>
> >> None of these issues affect rte_random.c's potential usage of TLS
> >> (except lazy [re-]initialization makes things more complicated).
> >>
> >> Preferably, there should be one pattern that is usable across all or
> >> at least most DPDK modules requiring per-lcore state.
> >>
> >>> This has a several benefits.
> >>>   - no false cache sharing from cpu prefetching
> >>>   - fixes initialization of random state for non-DPDK threads
> >>
> >> This seems like a non-reason to me. That bug is easily fixed, if it
> >> isn't already.
> >>
> >>>   - fixes unsafe usage of random state by non-DPDK threads
> >>>
> >>
> >> "Makes random number generation MT safe from all threads (including
> >> unregistered non-EAL threads)."
> >>
> >> With current API semantics you may still register an non-EAL thread,
> >> to get MT safe access to this API, so I guess it's more about being
> >> more convenient and less error prone, than anything else.
> >
> > I understand that we never guaranteed MT safety for non-EAL threads
> here,
> 
> 
> Registered non-EAL threads have a lcore id and thus may safely call
> rte_rand(). Multiple unregistered non-EAL threads may not do so, in
> parallel.
> 
> 
> > but as a user of rte_rand() - it would be much more convenient, if I
> can
> > use it
> > from any thread wthout worring is it a EAL thread or not.
> 
> Sure, especially if it comes for free. The for-free solution has yet to
> reveal itself though.


We could offer re-entrant function variants for non-EAL threads:

uint64_t rte_rand_r(struct rte_rand_state * const state);
void rte_srand_r(struct rte_rand_state * const state, uint64_t seed);
uint64_t rte_rand_max_r(struct rte_rand_state * const state, uint64_t 
upper_bound);
double rte_drand_r(struct rte_rand_state * const state, void);

For this to work, we would have to make struct rte_rand_state public, and the 
application would need to allocate it. (At least one instance per thread that 
uses it, obviously.)

> 
> >
> > About TlS usage and re-seeding - can we use some sort of middle-
> ground:
> > extend rte_rand_state with some gen-counter.
> > Make a 'master' copy of rte_rand_state that will be updated by
> rte_srand(),
> > and TLS copies of rte_rand_state, so rte_rand() can fist compare
> > its gen-counter value with master copy to decide,
> > does it need to copy new state from master or not.
> >
> 
> Calling threads shouldn't all produce the same sequence. That would be
> silly and not very random. The generation number should be tied to the
> seed.

I previously thought about seeding...

We are trying to be random, we are not explicitly pseudo-random.

So I came to the conclusion that the ability to reproduce data (typically for 
verification purposes) is not a requirement here.

> 
> >
> >> The new MT safety guarantees should be in the API docs as well.
> >
> > Yes, it is an extension to the current API, not a fix.
> >
> >>
> >>> The initialization of random number state is done by the
> >>> lcore (lazy initialization).
> >>>
> >>> Signed-off-by: Stephen Hemminger <step...@networkplumber.org>
> >>> ---
> >>>   lib/eal/common/rte_random.c | 38 +++++++++++++++++++--------------
> ----
> >>>   1 file changed, 20 insertions(+), 18 deletions(-)
> >>>
> >>> diff --git a/lib/eal/common/rte_random.c
> b/lib/eal/common/rte_random.c
> >>> index 53636331a27b..9657adf6ad3b 100644
> >>> --- a/lib/eal/common/rte_random.c
> >>> +++ b/lib/eal/common/rte_random.c
> >>> @@ -19,13 +19,14 @@ struct rte_rand_state {
> >>>       uint64_t z3;
> >>>       uint64_t z4;
> >>>       uint64_t z5;
> >>> -} __rte_cache_aligned;
> >>> +    uint64_t seed;
> >>> +};
> >>> -/* One instance each for every lcore id-equipped thread, and one
> >>> - * additional instance to be shared by all others threads (i.e.,
> all
> >>> - * unregistered non-EAL threads).
> >>> - */
> >>> -static struct rte_rand_state rand_states[RTE_MAX_LCORE + 1];
> >>> +/* Global random seed */
> >>> +static uint64_t rte_rand_seed;
> >>> +
> >>> +/* Per lcore random state. */
> >>> +static RTE_DEFINE_PER_LCORE(struct rte_rand_state, rte_rand_state);
> >>>   static uint32_t
> >>>   __rte_rand_lcg32(uint32_t *seed)
> >>> @@ -81,11 +82,7 @@ __rte_srand_lfsr258(uint64_t seed, struct
> >>> rte_rand_state *state)
> >>>   void
> >>>   rte_srand(uint64_t seed)
> >>>   {
> >>> -    unsigned int lcore_id;
> >>> -
> >>> -    /* add lcore_id to seed to avoid having the same sequence */
> >>> -    for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++)
> >>> -        __rte_srand_lfsr258(seed + lcore_id,
> &rand_states[lcore_id]);
> >>> +    __atomic_store_n(&rte_rand_seed, seed, __ATOMIC_RELAXED);
> >>>   }
> >>>   static __rte_always_inline uint64_t
> >>> @@ -119,15 +116,18 @@ __rte_rand_lfsr258(struct rte_rand_state
> *state)
> >>>   static __rte_always_inline
> >>>   struct rte_rand_state *__rte_rand_get_state(void)
> >>>   {
> >>> -    unsigned int idx;
> >>> +    struct rte_rand_state *rand_state =
> &RTE_PER_LCORE(rte_rand_state);
> >>
> >> There should really be a RTE_PER_THREAD, an alias to RTE_PER_LCORE,
> to
> >> cover this usage. Or just use __thread (or _Thread_local?).
> >>
> >>> +    uint64_t seed;
> >>> -    idx = rte_lcore_id();
> >>> +    seed = __atomic_load_n(&rte_rand_seed, __ATOMIC_RELAXED);
> >>> +    if (unlikely(seed != rand_state->seed)) {
> >>> +        rand_state->seed = seed;
> >>
> >> Re-seeding should restart the series, on all lcores. There's nothing
> >> preventing the user from re-seeding the machinery repeatedly, with
> the
> >> same seed. Seems like an unusual, but still valid, use case, if you
> >> run repeated tests of some sort.
> >>
> >> Use a seqlock? :) I guess you need a seed generation number as well
> >> (e.g., is this the first time you seed with X, or the second one,
> etc.)
> >>
> >>> -    /* last instance reserved for unregistered non-EAL threads */
> >>> -    if (unlikely(idx == LCORE_ID_ANY))
> >>> -        idx = RTE_MAX_LCORE;
> >>> +        seed += rte_thread_self().opaque_id;
> >>> +        __rte_srand_lfsr258(seed, rand_state);
> >>> +    }
> >>> -    return &rand_states[idx];
> >>> +    return rand_state;
> >>>   }
> >>>   uint64_t
> >>> @@ -227,7 +227,9 @@ RTE_INIT(rte_rand_init)
> >>>   {
> >>>       uint64_t seed;
> >>> -    seed = __rte_random_initial_seed();
> >>> +    do
> >>> +        seed = __rte_random_initial_seed();
> >>> +    while (seed == 0);
> >>
> >> Might be worth a comment why seed 0 is not allowed. Alternatively,
> use
> >> some other way of signaling __rte_srand_lfsr258() must be called.
> >>
> >>>       rte_srand(seed);
> >>>   }
> >

RE: [RFC] random: use per lcore state

Reply via email to