09/09/2023 07:45, Mattias Rönnblom пишет:
On 2023-09-09 02:13, Konstantin Ananyev wrote:
06/09/2023 21:02, Mattias Rönnblom пишет:
On 2023-09-06 19:20, Stephen Hemminger wrote:
Move the random number state into thread local storage.

Me and Morten discussed TLS versus other alternatives in some other thread. The downside of TLS that Morten pointed out, from what I recall, is that lazy initialization is *required* (since the number of threads is open-ended), and the data ends up in non-huge page memory.

Hmm.. correct me if I am wrong, but with current implementation,
rand state is also in non-huge memory:
static struct rte_rand_state rand_states[RTE_MAX_LCORE + 1];


Yes. The current pattern is certainly not perfect.


It was also unclear to me what the memory footprint implications would be,h would large per-lcore data structures be put in TLS. More specifically, if they would be duplicated across all threads, even non-lcore threads.

None of these issues affect rte_random.c's potential usage of TLS (except lazy [re-]initialization makes things more complicated).

Preferably, there should be one pattern that is usable across all or at least most DPDK modules requiring per-lcore state.

This has a several benefits.
  - no false cache sharing from cpu prefetching
  - fixes initialization of random state for non-DPDK threads

This seems like a non-reason to me. That bug is easily fixed, if it isn't already.

  - fixes unsafe usage of random state by non-DPDK threads


"Makes random number generation MT safe from all threads (including unregistered non-EAL threads)."

With current API semantics you may still register an non-EAL thread, to get MT safe access to this API, so I guess it's more about being more convenient and less error prone, than anything else.

I understand that we never guaranteed MT safety for non-EAL threads here,


Registered non-EAL threads have a lcore id and thus may safely call rte_rand().

I am aware about such ability, but for me register/unregister thread
just to call rte_rand() seems like way too much hassle.

Multiple unregistered non-EAL threads may not do so, in parallel.


but as a user of rte_rand() - it would be much more convenient, if I can use it
from any thread wthout worring is it a EAL thread or not.

Sure, especially if it comes for free. The for-free solution has yet to reveal itself though.


About TlS usage and re-seeding - can we use some sort of middle-ground:
extend rte_rand_state with some gen-counter.
Make a 'master' copy of rte_rand_state that will be updated by rte_srand(),
and TLS copies of rte_rand_state, so rte_rand() can fist compare
its gen-counter value with master copy to decide,
does it need to copy new state from master or not.


Calling threads shouldn't all produce the same sequence. That would be silly and not very random. The generation number should be tied to the seed.

Actually, yes you right, probably we don't need a master copy of rte_rand_state itself.
It seems that just having a 'master' copy of 'seed' value, plus some
counter (to indicate that seed has been updated) is enough here.



The new MT safety guarantees should be in the API docs as well.

Yes, it is an extension to the current API, not a fix.


The initialization of random number state is done by the
lcore (lazy initialization).

Signed-off-by: Stephen Hemminger <step...@networkplumber.org>
---
  lib/eal/common/rte_random.c | 38 +++++++++++++++++++------------------
  1 file changed, 20 insertions(+), 18 deletions(-)

diff --git a/lib/eal/common/rte_random.c b/lib/eal/common/rte_random.c
index 53636331a27b..9657adf6ad3b 100644
--- a/lib/eal/common/rte_random.c
+++ b/lib/eal/common/rte_random.c
@@ -19,13 +19,14 @@ struct rte_rand_state {
      uint64_t z3;
      uint64_t z4;
      uint64_t z5;
-} __rte_cache_aligned;
+    uint64_t seed;
+};
-/* One instance each for every lcore id-equipped thread, and one
- * additional instance to be shared by all others threads (i.e., all
- * unregistered non-EAL threads).
- */
-static struct rte_rand_state rand_states[RTE_MAX_LCORE + 1];
+/* Global random seed */
+static uint64_t rte_rand_seed;
+
+/* Per lcore random state. */
+static RTE_DEFINE_PER_LCORE(struct rte_rand_state, rte_rand_state);
  static uint32_t
  __rte_rand_lcg32(uint32_t *seed)
@@ -81,11 +82,7 @@ __rte_srand_lfsr258(uint64_t seed, struct rte_rand_state *state)
  void
  rte_srand(uint64_t seed)
  {
-    unsigned int lcore_id;
-
-    /* add lcore_id to seed to avoid having the same sequence */
-    for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++)
-        __rte_srand_lfsr258(seed + lcore_id, &rand_states[lcore_id]);
+    __atomic_store_n(&rte_rand_seed, seed, __ATOMIC_RELAXED);
  }
  static __rte_always_inline uint64_t
@@ -119,15 +116,18 @@ __rte_rand_lfsr258(struct rte_rand_state *state)
  static __rte_always_inline
  struct rte_rand_state *__rte_rand_get_state(void)
  {
-    unsigned int idx;
+    struct rte_rand_state *rand_state = &RTE_PER_LCORE(rte_rand_state);

There should really be a RTE_PER_THREAD, an alias to RTE_PER_LCORE, to cover this usage. Or just use __thread (or _Thread_local?).

+    uint64_t seed;
-    idx = rte_lcore_id();
+    seed = __atomic_load_n(&rte_rand_seed, __ATOMIC_RELAXED);
+    if (unlikely(seed != rand_state->seed)) {
+        rand_state->seed = seed;

Re-seeding should restart the series, on all lcores. There's nothing preventing the user from re-seeding the machinery repeatedly, with the same seed. Seems like an unusual, but still valid, use case, if you run repeated tests of some sort.

Use a seqlock? :) I guess you need a seed generation number as well (e.g., is this the first time you seed with X, or the second one, etc.)

-    /* last instance reserved for unregistered non-EAL threads */
-    if (unlikely(idx == LCORE_ID_ANY))
-        idx = RTE_MAX_LCORE;
+        seed += rte_thread_self().opaque_id;
+        __rte_srand_lfsr258(seed, rand_state);
+    }
-    return &rand_states[idx];
+    return rand_state;
  }
  uint64_t
@@ -227,7 +227,9 @@ RTE_INIT(rte_rand_init)
  {
      uint64_t seed;
-    seed = __rte_random_initial_seed();
+    do
+        seed = __rte_random_initial_seed();
+    while (seed == 0);

Might be worth a comment why seed 0 is not allowed. Alternatively, use some other way of signaling __rte_srand_lfsr258() must be called.

      rte_srand(seed);
  }


Reply via email to