On 2023-03-22 07:08, Ondřej Surý via lttng-dev wrote:
Hi,

the documentation is pretty silent on this, and asking here is probably going 
to be faster
than me trying to use the source to figure this out.

Is it legal to call_rcu() from within the call_rcu() callback?

Yes. call_rcu callbacks can be chained.

Note that you'll need to issue rcu_barrier() on program exit as many times as 
you chained call_rcu callbacks if you intend to make sure no queued callbacks 
still exist on program clean shutdown. See this comment above 
urcu_call_rcu_exit():

 * Teardown the default call_rcu worker thread if there are no queued
 * callbacks on process exit. This prevents leaking memory.
 *
 * Here is how an application can ensure graceful teardown of this
 * worker thread:
 *
 * - An application queuing call_rcu callbacks should invoke
 *   rcu_barrier() before it exits.
 * - When chaining call_rcu callbacks, the number of calls to
 *   rcu_barrier() on application exit must match at least the maximum
 *   number of chained callbacks.
 * - If an application chains callbacks endlessly, it would have to be
 *   modified to stop chaining callbacks when it detects an application
 *   exit (e.g. with a flag), and wait for quiescence with rcu_barrier()
 *   after setting that flag.
 * - The statements above apply to a library which queues call_rcu
 *   callbacks, only it needs to invoke rcu_barrier in its library
 *   destructor.



What about the other RCU (and CDS) API calls?

They can be unless stated otherwise. For instance, rcu_barrier() cannot be 
called from a call_rcu worker thread.


How does that interact with create_call_rcu_data()?  I have <n> event loops and 
I am
initializing <n> 1:1 call_rcu helper threads as I need to do some per-thread 
initialization
as some of the destroy-like functions use random numbers (don't ask).

As I recall, set_thread_call_rcu_data() will associate a call_rcu worker 
instance for the current thread. So all following call_rcu() invocations from 
that thread will be queued into this per-thread call_rcu queue, and handled by 
the call_rcu worker thread.

But I wonder why you inherently need this 1:1 mapping, rather than using the 
content of the structure containing the rcu_head to figure out which per-thread 
data should be used ?

If you manage to separate the context from the worker thread instances, then 
you could use per-cpu call_rcu worker threads, which will eventually scale even 
better when I integrate the liburcu call_rcu API with sys_rseq concurrency ids 
[1].


If it's legal to call_rcu() from call_rcu thread, which thread is going to be 
used?

The call_rcu invoked from the call_rcu worker thread will queue the call_rcu 
callback onto the queue handled by that worker thread. It does so by setting

  URCU_TLS(thread_call_rcu_data) = crdp;

early in call_rcu_thread(). So any chained call_rcu is handled by the same 
call_rcu worker thread doing the chaining, with the exception of teardown where 
the pending callbacks are moved to the default worker thread.

Thanks,

Mathieu

[1] 
https://lore.kernel.org/lkml/20221122203932.231377-1-mathieu.desnoy...@efficios.com/



Thank you,
Ondrej
--
Ondřej Surý (He/Him)
ond...@sury.org

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

Reply via email to