This series fixes a bug where rdp->defer_qs_pending can remain stuck in
PENDING when a preempted reader's quiescent state is reported up-tree via
a path other than the deferred-QS irq-work handler (FQS scan, hotplug
transition, expedited GP IPI, context switch). Once stuck, the pending
gate in rcu_read_unlock_special() silently suppresses all future arming
attempts on that CPU. The series adds PENDING -> IDLE transitions at the
missing sites (patches 1-7), including the case where the deferred-QS
irq-work handler may run between segments of a compound section (per Paul
McKenney's counter-example) and the softirq deferred-QS arming path.

Patch 8 adds a per-CPU rescue hrtimer that bounds the worst-case
deferred-QS reporting latency: when the irq-work handler lands in a clean
(non-reader, non-compound) context it reports the quiescent state directly
via the new rcu_preempt_deferred_qs_try_report() helper, and the rescue timer
reuses the same helper so that, under preempt=none, the QS report is quick
without depending on the scheduler.  The rescue timer is cancelled from the
normal deferred-QS report path so it does not fire once the quiescent state
has already been reported.

This version is rebased on top of Paul's latest rcu/dev branch. The
rcutorture reader-end deboost test patches that were folded into v3 are now
in rcu/dev and have been dropped here. The git tree below additionally
carries two debug-only commits on top of the series ([TEST COMMIT], not for
merge): a detector that WARNs if defer_qs_pending is stuck at GP cleanup,
and an rcutorture tweak that gives the async deboost mechanisms up to 500us
before warning. Applied alone on unmodified mainline, the detector reliably
fires within 5 minutes under TREE03 rcutorture; with the full fix applied I
could not reproduce the issue.

The git tree with all patches can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git (tag: 
rcu-dqs-stuck-v4-20260625)

Change log:

Changes from v3 to v4:
- Rebased on top of Paul's latest rcu/dev branch.
- Dropped the rcutorture reader-end deboost test patches.
- Reclassified "rcutorture: give async deboost mechanisms up to 500us before
  WARN" as a debug-only [TEST COMMIT] (not for merge).
- "rcu: add per-CPU rescue hrtimer for deferred-QS reporting": cancel the
  rescue hrtimer from rcu_preempt_deferred_qs_irqrestore() via
  hrtimer_try_to_cancel() so it no longer fires after a normal report path
  has already reported the QS -- about a 90% reduction in rescue-timer fires
  under TREE03 with rcutorture.gp_exp=1.
- Patches 1-7 are unchanged from v3.

Changes from v2 to v3:
- Folded in the rcutorture "reader-end deboost testing" patches (three from
  Paul, two from me), previously posted separately as an RFC, so the fix
  and its test coverage can be reviewed together:
  https://lore.kernel.org/all/[email protected]/
- New patch "rcu: add per-CPU rescue hrtimer for deferred-QS reporting" to
  bound the worst-case deferred-QS reporting latency.
- New patch "rcu: clear defer_qs_pending in deferred-QS bail when nesting > 0".
- Reworked "rcu: clear defer_qs_pending in handler for compounded sections":
  the irq-work handler now reports the deferred QS directly via the new
  rcu_preempt_deferred_qs_try_report() helper when it lands in a clean
  context, instead of only nudging the scheduler.

Changes from v1 to v2:
- Dropped RFC tag now that softirq paths have been investigated.
- Added new patch "rcu: set need_resched on softirq deferred-QS arming
  path" to handle the softirq arming case that was deferred in v1.

Link to v3: 
https://lore.kernel.org/all/[email protected]/
Link to v2: 
https://lore.kernel.org/all/[email protected]/
Link to v1: 
https://lore.kernel.org/all/[email protected]/

Joel Fernandes (8):
  rcu: introduce rcu_defer_qs_clear() helper
  rcu: clear defer_qs_pending when notifying GP changes
  rcu: clear defer_qs_pending in handler for compounded sections
  rcu: drop redundant defer_qs_pending clear in irqrestore handler
  rcu: clear defer_qs_pending at expedited IPI entry
  rcu: set need_resched on softirq deferred-QS arming path
  rcu: clear defer_qs_pending in deferred-QS bail when nesting > 0
  rcu: add per-CPU rescue hrtimer for deferred-QS reporting

 kernel/rcu/tree.c        |   3 +
 kernel/rcu/tree.h        |   6 ++
 kernel/rcu/tree_exp.h    |   6 ++
 kernel/rcu/tree_plugin.h | 149 ++++++++++++++++++++++++++++++++++-----
 4 files changed, 147 insertions(+), 17 deletions(-)


base-commit: 47e26f0fd70890ddd810887a043303a365a8bf03
-- 
2.34.1


Reply via email to