On 2023-09-07 01:00, Stephen Hemminger wrote:
On Wed, 6 Sep 2023 22:02:54 +0200
Mattias Rönnblom <hof...@lysator.liu.se> wrote:
On 2023-09-06 19:20, Stephen Hemminger wrote:
Move the random number state into thread local storage.
Me and Morten discussed TLS versus other alternatives in some other
thread. The downside of TLS that Morten pointed out, from what I recall,
is that lazy initialization is *required* (since the number of threads
is open-ended), and the data ends up in non-huge page memory. It was
also unclear to me what the memory footprint implications would be,
would large per-lcore data structures be put in TLS. More specifically,
if they would be duplicated across all threads, even non-lcore threads.
But current method is unsafe on non-lcore threads.
Two non-lcore threads calling rte_rand() will clash on state without
any locking protection.
Sure, just like the API docs say, although the documentation use more
precise terminology.
If you want to extend the API MT safety guarantees, it should come with
an argument to why this change is needed.
Is this to save the application from calling rte_thread_register() in
control plane threads? For convenience? Or for being generally less
error prone?
Another reason might be that the application have many threads (more
than RTE_LCORE_MAX), so it will run out of lcore ids.
Also, right now the array is sized at 129 entries to allow for the
maximum number of lcores. When the maximum is increased to 512 or 1024
the problem will get worse.
Using TLS will penalize every thread in the process, not only EAL
threads and registered non-EAL threads, and worse: not only threads that
are using the API in question.
Every thread will carry the TLS memory around, increasing the process
memory footprint.
Thread creation will be slower, since TLS memory is allocated *and
initialized*, lazy user code-level initialization or not.
On my particular Linux x86_64 system, pthread creation overhead looks
something like:
8 us w/o any user code-level use of TLS
11 us w/ 16 kB of TLS
314 us w/ 2 MB of TLS.
So, whatever you put into TLS, it needs to be small.
Putting a large amount of data into TLS will effectively prevent the
DPDK libraries from being linked into a heavily multi-threaded app,
regardless if those threads calls into DPDK or not.
Again, this doesn't much affect rte_random.c, but does disqualify TLS as
a plug-in replacement for the current pattern with a statically
allocated lcore id-indexed array.