On Fri, Feb 14, 2025 at 12:25:59AM +0100, Frederic Weisbecker wrote:
> A CPU coming online checks for an ongoing grace period and reports
> a quiescent state accordingly if needed. This special treatment that
> shortcuts the expedited IPI finds its origin as an optimization purpose
> on the following commit:
> 
>       338b0f760e84 (rcu: Better hotplug handling for 
> synchronize_sched_expedited()
> 
> The point is to avoid an IPI while waiting for a CPU to become online
> or failing to become offline.
> 
> However this is pointless and even error prone for several reasons:
> 
> * If the CPU has been seen offline in the first round scanning offline
>   and idle CPUs, no IPI is even tried and the quiescent state is
>   reported on behalf of the CPU.
> 
> * This means that if the IPI fails, the CPU just became offline. So
>   it's unlikely to become online right away, unless the cpu hotplug
>   operation failed and rolled back, which is a rare event that can
>   wait a jiffy for a new IPI to be issued.
> 
> * But then the "optimization" applying on failing CPU hotplug down only
>   applies to !PREEMPT_RCU.
> 
> * This force reports a quiescent state even if ->cpu_no_qs.b.exp is not
>   set. As a result it can race with remote QS reports on the same rdp.
>   Fortunately it happens to be OK but an accident is waiting to happen.
> 
> For all those reasons, remove this optimization that doesn't look worthy
> to keep around.

Thank you for digging into this!

When I ran tests that removed the call to sync_sched_exp_online_cleanup()
a few months ago, I got grace-period hangs [1].  Has something changed
to make this safe?

                                                        Thanx, Paul

[1] 
https://docs.google.com/document/d/1-JQ4QYF1qid0TWSLa76O1kusdhER2wgm0dYdwFRUzU8/edit?usp=sharing

> Reported-by: Paul E. McKenney <paul...@kernel.org>
> Signed-off-by: Frederic Weisbecker <frede...@kernel.org>
> ---
>  kernel/rcu/tree.c     |  2 --
>  kernel/rcu/tree_exp.h | 45 ++-----------------------------------------
>  2 files changed, 2 insertions(+), 45 deletions(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 8625f616c65a..86935fe00397 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -151,7 +151,6 @@ static void rcu_report_qs_rnp(unsigned long mask, struct 
> rcu_node *rnp,
>                             unsigned long gps, unsigned long flags);
>  static void invoke_rcu_core(void);
>  static void rcu_report_exp_rdp(struct rcu_data *rdp);
> -static void sync_sched_exp_online_cleanup(int cpu);
>  static void check_cb_ovld_locked(struct rcu_data *rdp, struct rcu_node *rnp);
>  static bool rcu_rdp_is_offloaded(struct rcu_data *rdp);
>  static bool rcu_rdp_cpu_online(struct rcu_data *rdp);
> @@ -4259,7 +4258,6 @@ int rcutree_online_cpu(unsigned int cpu)
>       raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
>       if (rcu_scheduler_active == RCU_SCHEDULER_INACTIVE)
>               return 0; /* Too early in boot for scheduler work. */
> -     sync_sched_exp_online_cleanup(cpu);
>  
>       // Stop-machine done, so allow nohz_full to disable tick.
>       tick_dep_clear(TICK_DEP_BIT_RCU);
> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> index caff16e441d1..a3f962eb7057 100644
> --- a/kernel/rcu/tree_exp.h
> +++ b/kernel/rcu/tree_exp.h
> @@ -751,12 +751,8 @@ static void rcu_exp_handler(void *unused)
>       struct task_struct *t = current;
>  
>       /*
> -      * First, is there no need for a quiescent state from this CPU,
> -      * or is this CPU already looking for a quiescent state for the
> -      * current grace period?  If either is the case, just leave.
> -      * However, this should not happen due to the preemptible
> -      * sync_sched_exp_online_cleanup() implementation being a no-op,
> -      * so warn if this does happen.
> +      * WARN if the CPU is unexpectedly already looking for a
> +      * QS or has already reported one.
>        */
>       ASSERT_EXCLUSIVE_WRITER_SCOPED(rdp->cpu_no_qs.b.exp);
>       if (WARN_ON_ONCE(!(READ_ONCE(rnp->expmask) & rdp->grpmask) ||
> @@ -803,11 +799,6 @@ static void rcu_exp_handler(void *unused)
>       WARN_ON_ONCE(1);
>  }
>  
> -/* PREEMPTION=y, so no PREEMPTION=n expedited grace period to clean up 
> after. */
> -static void sync_sched_exp_online_cleanup(int cpu)
> -{
> -}
> -
>  /*
>   * Scan the current list of tasks blocked within RCU read-side critical
>   * sections, printing out the tid of each that is blocking the current
> @@ -885,38 +876,6 @@ static void rcu_exp_handler(void *unused)
>       rcu_exp_need_qs();
>  }
>  
> -/* Send IPI for expedited cleanup if needed at end of CPU-hotplug operation. 
> */
> -static void sync_sched_exp_online_cleanup(int cpu)
> -{
> -     unsigned long flags;
> -     int my_cpu;
> -     struct rcu_data *rdp;
> -     int ret;
> -     struct rcu_node *rnp;
> -
> -     rdp = per_cpu_ptr(&rcu_data, cpu);
> -     rnp = rdp->mynode;
> -     my_cpu = get_cpu();
> -     /* Quiescent state either not needed or already requested, leave. */
> -     if (!(READ_ONCE(rnp->expmask) & rdp->grpmask) ||
> -         READ_ONCE(rdp->cpu_no_qs.b.exp)) {
> -             put_cpu();
> -             return;
> -     }
> -     /* Quiescent state needed on current CPU, so set it up locally. */
> -     if (my_cpu == cpu) {
> -             local_irq_save(flags);
> -             rcu_exp_need_qs();
> -             local_irq_restore(flags);
> -             put_cpu();
> -             return;
> -     }
> -     /* Quiescent state needed on some other CPU, send IPI. */
> -     ret = smp_call_function_single(cpu, rcu_exp_handler, NULL, 0);
> -     put_cpu();
> -     WARN_ON_ONCE(ret);
> -}
> -
>  /*
>   * Because preemptible RCU does not exist, we never have to check for
>   * tasks blocked within RCU read-side critical sections that are
> -- 
> 2.46.0
> 

Reply via email to