On Thu, Feb 25, 2021 at 09:22:48AM -0500, Mathieu Desnoyers wrote:
> Hi Paul,
> 
> Answering a question from Peter on IRC got me to look at 
> rcu_read_lock_trace(), and I see this:
> 
> static inline void rcu_read_lock_trace(void)
> {
>         struct task_struct *t = current;
> 
>         WRITE_ONCE(t->trc_reader_nesting, READ_ONCE(t->trc_reader_nesting) + 
> 1);
>         barrier();
>         if (IS_ENABLED(CONFIG_TASKS_TRACE_RCU_READ_MB) &&
>             t->trc_reader_special.b.need_mb)
>                 smp_mb(); // Pairs with update-side barriers
>         rcu_lock_acquire(&rcu_trace_lock_map);
> }
> 
> static inline void rcu_read_unlock_trace(void)
> {
>         int nesting;
>         struct task_struct *t = current;
> 
>         rcu_lock_release(&rcu_trace_lock_map);
>         nesting = READ_ONCE(t->trc_reader_nesting) - 1;
>         barrier(); // Critical section before disabling.
>         // Disable IPI-based setting of .need_qs.
>         WRITE_ONCE(t->trc_reader_nesting, INT_MIN);
>         if (likely(!READ_ONCE(t->trc_reader_special.s)) || nesting) {
>                 WRITE_ONCE(t->trc_reader_nesting, nesting);
>                 return;  // We assume shallow reader nesting.
>         }
>         rcu_read_unlock_trace_special(t, nesting);
> }
> 
> AFAIU, each thread keeps track of whether it is nested within a RCU read-side 
> critical
> section with a counter, and grace periods iterate over all threads to make 
> sure they
> are not within a read-side critical section before they can complete:
> 
> # define rcu_tasks_trace_qs(t)                                          \
>         do {                                                            \
>                 if (!likely(READ_ONCE((t)->trc_reader_checked)) &&      \
>                     !unlikely(READ_ONCE((t)->trc_reader_nesting))) {    \
>                         smp_store_release(&(t)->trc_reader_checked, true); \
>                         smp_mb(); /* Readers partitioned by store. */   \
>                 }                                                       \
>         } while (0)
> 
> It reminds me of the liburcu urcu-mb flavor which also deals with per-thread
> state to track whether threads are nested within a critical section:
> 
> https://github.com/urcu/userspace-rcu/blob/master/include/urcu/static/urcu-mb.h#L90
> https://github.com/urcu/userspace-rcu/blob/master/include/urcu/static/urcu-mb.h#L125
> 
> static inline void _urcu_mb_read_lock_update(unsigned long tmp)
> {
>       if (caa_likely(!(tmp & URCU_GP_CTR_NEST_MASK))) {
>               _CMM_STORE_SHARED(URCU_TLS(urcu_mb_reader).ctr, 
> _CMM_LOAD_SHARED(urcu_mb_gp.ctr));
>               cmm_smp_mb();
>       } else
>               _CMM_STORE_SHARED(URCU_TLS(urcu_mb_reader).ctr, tmp + 
> URCU_GP_COUNT);
> }
> 
> static inline void _urcu_mb_read_lock(void)
> {
>       unsigned long tmp;
> 
>       urcu_assert(URCU_TLS(urcu_mb_reader).registered);
>       cmm_barrier();
>       tmp = URCU_TLS(urcu_mb_reader).ctr;
>       urcu_assert((tmp & URCU_GP_CTR_NEST_MASK) != URCU_GP_CTR_NEST_MASK);
>       _urcu_mb_read_lock_update(tmp);
> }
> 
> The main difference between the two algorithm is that task-trace within the
> kernel lacks the global "urcu_mb_gp.ctr" state snapshot, which is either
> incremented or flipped between 0 and 1 by the grace period. This allow RCU 
> readers
> outermost nesting starting after the beginning of the grace period not to 
> prevent
> progress of the grace period.
> 
> Without this, a steady flow of incoming tasks-trace-RCU readers can prevent 
> the
> grace period from ever completing.
> 
> Or is this handled in a clever way that I am missing here ?

There are several mechanisms designed to handle this.  The following
paragraphs describe these at a high level.

The trc_wait_for_one_reader() is invoked on each task.  It uses the
try_invoke_on_locked_down_task(), which, if the task is currently not
running, keeps it that way and invokes trc_inspect_reader().  If the
locked-down task is in a read-side critical section, the need_qs field
is set, which will cause the task's next rcu_read_lock_trace() to report
the quiescent state.

If read-side memory barriers have been enabled, trc_inspect_reader()
is able to check for a reader being active, and if not, reports the
quiescent state.  If there is a reader, trc_inspect_reader() reports
failure, which is another path to the following paragraph.

If the task could not be locked down due its currently running,
then trc_wait_for_one_reader() attempts to send an IPI, which results in
trc_read_check_handler() rechecking for a read-side critical section
and either reporting the quiescent state immediately or proceding in the
same way that trc_inspect_reader() does.  The trc_read_check_handler()
of course checks to make sure that the target task is still running
before doing anything.  If the attempt to send the IPI fails, then
the task is rechecked in a later pass.

So what sequence of events did you find that causes these mechanisms
to fail?

                                                        Thanx, Paul

Reply via email to