On 2023-03-22 07:08, Ondřej Surý via lttng-dev wrote:
Hi,
the documentation is pretty silent on this, and asking here is probably going
to be faster
than me trying to use the source to figure this out.
Is it legal to call_rcu() from within the call_rcu() callback?
Yes. call_rcu callbacks can be chained.
Note that you'll need to issue rcu_barrier() on program exit as many times as
you chained call_rcu callbacks if you intend to make sure no queued callbacks
still exist on program clean shutdown. See this comment above
urcu_call_rcu_exit():
* Teardown the default call_rcu worker thread if there are no queued
* callbacks on process exit. This prevents leaking memory.
*
* Here is how an application can ensure graceful teardown of this
* worker thread:
*
* - An application queuing call_rcu callbacks should invoke
* rcu_barrier() before it exits.
* - When chaining call_rcu callbacks, the number of calls to
* rcu_barrier() on application exit must match at least the maximum
* number of chained callbacks.
* - If an application chains callbacks endlessly, it would have to be
* modified to stop chaining callbacks when it detects an application
* exit (e.g. with a flag), and wait for quiescence with rcu_barrier()
* after setting that flag.
* - The statements above apply to a library which queues call_rcu
* callbacks, only it needs to invoke rcu_barrier in its library
* destructor.
What about the other RCU (and CDS) API calls?
They can be unless stated otherwise. For instance, rcu_barrier() cannot be
called from a call_rcu worker thread.
How does that interact with create_call_rcu_data()? I have <n> event loops and
I am
initializing <n> 1:1 call_rcu helper threads as I need to do some per-thread
initialization
as some of the destroy-like functions use random numbers (don't ask).
As I recall, set_thread_call_rcu_data() will associate a call_rcu worker
instance for the current thread. So all following call_rcu() invocations from
that thread will be queued into this per-thread call_rcu queue, and handled by
the call_rcu worker thread.
But I wonder why you inherently need this 1:1 mapping, rather than using the
content of the structure containing the rcu_head to figure out which per-thread
data should be used ?
If you manage to separate the context from the worker thread instances, then
you could use per-cpu call_rcu worker threads, which will eventually scale even
better when I integrate the liburcu call_rcu API with sys_rseq concurrency ids
[1].
If it's legal to call_rcu() from call_rcu thread, which thread is going to be
used?
The call_rcu invoked from the call_rcu worker thread will queue the call_rcu
callback onto the queue handled by that worker thread. It does so by setting
URCU_TLS(thread_call_rcu_data) = crdp;
early in call_rcu_thread(). So any chained call_rcu is handled by the same
call_rcu worker thread doing the chaining, with the exception of teardown where
the pending callbacks are moved to the default worker thread.
Thanks,
Mathieu
[1]
https://lore.kernel.org/lkml/20221122203932.231377-1-mathieu.desnoy...@efficios.com/
Thank you,
Ondrej
--
Ondřej Surý (He/Him)
ond...@sury.org
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev