Hi,

I am reporting an RCU stall / hung task issue triggered by a syzkaller
reproducer on Linux 7.0.0-rc1.

The system gets stuck with RCU stalls and multiple hung tasks. The
rcu_gp workqueue shows in-flight “process_srcu”, and kworkers running
“process_srcu” are blocked for >143s. At the same time, a repro task
blocks during perf event teardown (close()), waiting on SRCU.

Key log excerpts:

“Showing busy workqueues and worker pools: workqueue rcu_gp ...
in-flight: process_srcu”

“rcu: INFO: rcu_preempt detected stalls on CPUs/tasks ... Tasks
blocked on level-0 rcu_node ...”

hung tasks:

kworker/... Workqueue: rcu_gp process_srcu
process_srcu -> try_check_zero -> synchronize_rcu_normal -> wait_for_completion

repro2 task blocked in close():
__x64_sys_close -> __fput -> perf_release -> perf_event_release_kernel
-> __free_event
-> perf_trace_destroy -> perf_trace_event_unreg -> __synchronize_srcu
-> wait_for_completion

My understanding / suspected wait chain:

repro task closes a perf fd and enters perf trace teardown:
perf_trace_event_unreg() waits for SRCU via __synchronize_srcu().

rcu_gp workers running process_srcu attempt to advance SRCU and end up calling
synchronize_rcu_normal(), waiting for a normal RCU grace period to complete.

RCU grace period does not complete (RCU stall reported on CPUs/tasks), causing:
__synchronize_srcu() to wait indefinitely, process_srcu to remain
blocked, and the system to stall.

This looks like a circular wait involving perf trace teardown + SRCU +
RCU GP progress (possibly aggravated by timing/RT scheduling).

Reproducer:
C reproducer:  https://pastebin.com/raw/L9hDPCrP
kernel config:  https://pastebin.com/raw/jCq8qdq7
console output : https://pastebin.com/raw/D8XETkeH

Kernel:
git tree: torvalds/linux
commit: 4d349ee5c7782f8b27f6cb550f112c5e26fff38d
kernel version: 7.0.0-rc1
hardware: QEMU Ubuntu 24.10


[  170.750583] Showing busy workqueues and worker pools:
[  170.750595] workqueue rcu_gp: flags=0x108
[  170.750609]   pwq 2: cpus=0 node=0 flags=0x0 nice=0 active=2 refcnt=3
[  170.750635]     in-flight: 9:process_srcu ,11:process_srcu
[  170.750723] pool 2: cpus=0 node=0 flags=0x0 nice=0 hung=0s
workers=4 idle: 220
[  173.609623] perf: interrupt took too long (69138 > 69021), lowering
kernel.pe0
[  173.918434] perf: interrupt took too long (86584 > 86422), lowering
kernel.pe0
[  174.394436] perf: interrupt took too long (109613 > 108230),
lowering kernel.0
[  174.633565] perf: interrupt took too long (137226 > 137016),
lowering kernel.0
[  174.652442] perf: interrupt took too long (171805 > 171532),
lowering kernel.0
[  174.679472] perf: interrupt took too long (214868 > 214756),
lowering kernel.0
[  174.728383] perf: interrupt took too long (268722 > 268585),
lowering kernel.0
[  174.882802] perf: interrupt took too long (336002 > 335902),
lowering kernel.0
[  271.443907] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[  271.443951] rcu:     Tasks blocked on level-0 rcu_node (CPUs 0-1): P4093
[  271.443972] rcu:     (detected by 1, t=105002 jiffies, g=33573,
q=4342 ncpus=2)
[  271.443986] task:repro2          state:R  running task
stack:28792 pid:4093  tgid:4092  ppid:300    task_flags:0x400140
flags:0x00080012
[  271.444042] Call Trace:
[  271.444051]  <IRQ>
[  271.444058]  sched_show_task+0x3b5/0x650
[  271.444088]  ? trace_event_raw_event_sched_pi_setprio+0x440/0x440
[  271.444121]  ? rcu_dump_cpu_stacks+0x337/0x4b0
[  271.444146]  ? write_comp_data+0x1f/0x70
[  271.444164]  ? __sanitizer_cov_trace_pc+0x1a/0x40
[  271.444182]  ? wq_watchdog_touch+0xec/0x170
[  271.444198]  rcu_sched_clock_irq+0x297b/0x31c0
[  271.444221]  ? tmigr_requires_handle_remote_up+0x143/0x1c0
[  271.444245]  ? rcu_momentary_eqs+0x40/0x40
[  271.444261]  ? __tmigr_cpu_deactivate+0x1a0/0x1a0
[  271.444283]  ? __sanitizer_cov_trace_pc+0x1a/0x40
[  271.444301]  ? tmigr_requires_handle_remote+0x1cd/0x2a0
[  271.444324]  ? tmigr_handle_remote+0x320/0x320
[  271.444348]  ? write_comp_data+0x1f/0x70
[  271.444366]  ? write_comp_data+0x1f/0x70
[  271.444383]  ? __cgroup_account_cputime_field+0xb9/0x160
[  271.444406]  ? write_comp_data+0x1f/0x70
[  271.444424]  ? hrtimer_run_queues+0x64/0x450
[  271.444447]  update_process_times+0xfa/0x200
[  271.444469]  tick_nohz_handler+0x504/0x720
[  271.444487]  ? find_held_lock+0x2b/0x80
[  271.444505]  ? tick_do_update_jiffies64+0x380/0x380
[  271.444523]  ? lock_release+0xc9/0x2a0
[  271.444548]  __hrtimer_run_queues+0x771/0xb20
[  271.444568]  ? tick_do_update_jiffies64+0x380/0x380
[  271.444589]  ? write_comp_data+0x1f/0x70
[  271.444607]  ? enqueue_hrtimer+0x360/0x360
[  271.444626]  ? __sanitizer_cov_trace_pc+0x1a/0x40
[  271.444649]  hrtimer_interrupt+0x36e/0x820
[  271.444676]  __sysvec_apic_timer_interrupt+0xb5/0x3b0
[  271.444695]  sysvec_apic_timer_interrupt+0x6b/0x80
[  271.444720]  </IRQ>
[  271.444724]  <TASK>
[  271.444730]  asm_sysvec_apic_timer_interrupt+0x1a/0x20
[  271.444748] RIP: 0010:finish_task_switch+0x126/0x5c0
[  271.444771] Code: f6 4d 8d 7d 48 e8 8a 45 0a 00 31 f6 4c 89 ef e8
b0 fb ff ff 4c 89 ff e8 88 db b9 02 e8 93 c6 34 00 fb 65 4c 8b 3d 5a
34 cb 04 <49> 8d bf c8 14 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa
48 c1
[  271.444786] RSP: 0018:ffff888009db79b0 EFLAGS: 00000202
[  271.444798] RAX: 00000000001a071f RBX: ffff88800d6b4c80 RCX: 0000000000000006
[  271.444808] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff81467bbd
[  271.444817] RBP: ffff888009db79f8 R08: 0000000000000001 R09: 0000000000000001
[  271.444826] R10: fffffbfff0ae32ea R11: ffffffff85719757 R12: ffff8880073f9a80
[  271.444836] R13: ffff88806d338fc0 R14: 0000000000000001 R15: ffff888009d19a80
[  271.444852]  ? finish_task_switch+0x11d/0x5c0
[  271.444876]  ? finish_task_switch+0xe0/0x5c0
[  271.444897]  ? __sanitizer_cov_trace_pc+0x1a/0x40
[  271.444914]  ? __switch_to+0x854/0x1270
[  271.444936]  __schedule+0x1198/0x4190
[  271.444957]  ? io_schedule_timeout+0x80/0x80
[  271.444974]  ? mark_held_locks+0x49/0x70
[  271.444998]  preempt_schedule_irq+0x4e/0x90
[  271.445015]  irqentry_exit+0x17b/0x6c0
[  271.445037]  ? irqentry_enter+0x2a/0xd0
[  271.445058]  ? trace_hardirqs_off_finish+0x12f/0x160
[  271.445080]  asm_sysvec_apic_timer_interrupt+0x1a/0x20
[  271.445096] RIP: 0010:__sanitizer_cov_trace_pc+0xd/0x40
[  271.445127] Code: 00 00 ff 00 09 c2 75 07 4c 8b 81 e0 14 00 00 4c
89 c0 c3 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa bf 02 00 00 00 4c
8b 0c 24 <65> 48 8b 35 9b a7 9e 04 e8 46 ff ff ff 84 c0 74 20 48 8b 96
d0 14
[  271.445141] RSP: 0018:ffff888009db7c28 EFLAGS: 00000246
[  271.445152] RAX: dffffc0000000000 RBX: ffff88800df47b58 RCX: 0000000000000000
[  271.445161] RDX: 1ffff11001be8f71 RSI: ffff888009d19a80 RDI: 0000000000000002
[  271.445171] RBP: ffff888009d18000 R08: 0000000000000000 R09: ffffffff8167adf7
[  271.445180] R10: ffffed1001d87110 R11: ffff88800ec38883 R12: 0000000000000001
[  271.445190] R13: ffff888009db7cd0 R14: ffff888009d18028 R15: 00000000000000ec
[  271.445203]  ? __futex_wake_mark+0xb7/0xe0
[  271.445222]  ? __sanitizer_cov_trace_pc+0x1a/0x40
[  271.445240]  __futex_wake_mark+0xb7/0xe0
[  271.445255]  futex_wake_mark+0xa4/0x190
[  271.445272]  futex_wake+0x441/0x540
[  271.445289]  ? futex_wake_mark+0x190/0x190
[  271.445306]  ? percpu_counter_add_batch+0x11b/0x260
[  271.445329]  ? write_comp_data+0x1f/0x70
[  271.445348]  do_futex+0x26b/0x360
[  271.445363]  ? __ia32_sys_get_robust_list+0x140/0x140
[  271.445380]  ? lock_is_held_type+0x9b/0x110
[  271.445404]  __x64_sys_futex+0x1c9/0x480
[  271.445420]  ? do_futex+0x360/0x360
[  271.445433]  ? write_comp_data+0x1f/0x70
[  271.445450]  ? fd_install+0x1ec/0x4e0
[  271.445471]  ? __sanitizer_cov_trace_pc+0x1a/0x40
[  271.445488]  ? __sys_socket+0x4b/0x130
[  271.445508]  do_syscall_64+0x115/0x690
[  271.445529]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
[  271.445544] RIP: 0033:0x7fdfffbe4fc9
[  271.445568] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00
00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24
08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 97 8e 0d 00 f7 d8 64 89
01 48
[  271.445582] RSP: 002b:00007fdfffaebde8 EFLAGS: 00000246 ORIG_RAX:
00000000000000ca
[  271.445595] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fdfffbe4fc9
[  271.445605] RDX: 00000000000f4240 RSI: 0000000000000081 RDI: 000055b24d1850ec
[  271.445614] RBP: 00007fdfffaebe00 R08: 0000000000000000 R09: 0000000000000000
[  271.445623] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fffb7d2901e
[  271.445632] R13: 00007fffb7d2901f R14: 00007fdfffaebf00 R15: 0000000000022000
[  271.445650]  </TASK>
[  318.431885] INFO: task kworker/0:0:9 blocked for more than 143 seconds.
[  318.431913]       Not tainted 7.0.0-rc1-00301-g4d349ee5c778 #1
[  318.431921] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  318.431928] task:kworker/0:0     state:D stack:27912 pid:9
tgid:9     ppid:2      task_flags:0x4208060 flags:0x00080000
[  318.431983] Workqueue: rcu_gp process_srcu
[  318.432032] Call Trace:
[  318.432038]  <TASK>
[  318.432049]  __schedule+0x1190/0x4190
[  318.432078]  ? io_schedule_timeout+0x80/0x80
[  318.432095]  ? lock_release+0xc9/0x2a0
[  318.432119]  schedule+0xd1/0x260
[  318.432134]  schedule_timeout+0x240/0x280
[  318.432157]  ? hrtimer_nanosleep_restart+0x340/0x340
[  318.432182]  ? mark_held_locks+0x49/0x70
[  318.432203]  ? lockdep_hardirqs_on_prepare+0xd7/0x170
[  318.432224]  ? _raw_spin_unlock_irq+0x23/0x40
[  318.432245]  ? trace_hardirqs_on+0x18/0x170
[  318.432267]  wait_for_completion+0x169/0x320
[  318.432286]  ? wait_for_completion_killable+0x410/0x410
[  318.432304]  ? mark_held_locks+0x49/0x70
[  318.432324]  ? lockdep_hardirqs_on_prepare+0xd7/0x170
[  318.432348]  ? _raw_spin_unlock_irqrestore+0x2c/0x50
[  318.432372]  ? trace_hardirqs_on+0x18/0x170
[  318.432390]  ? _raw_spin_unlock_irqrestore+0x2c/0x50
[  318.432414]  synchronize_rcu_normal+0x208/0x5f0
[  318.432433]  ? start_poll_synchronize_rcu_full+0x90/0x90
[  318.432453]  ? do_raw_spin_lock+0x123/0x290
[  318.432481]  ? _raw_spin_unlock_irqrestore+0x2c/0x50
[  318.432504]  ? rt_mutex_slowunlock+0x824/0xaf0
[  318.432526]  ? lock_is_held_type+0x9b/0x110
[  318.432555]  try_check_zero+0x429/0x630
[  318.432583]  process_srcu+0x4c1/0x16c0
[  318.432607]  ? lock_acquire+0x187/0x2e0
[  318.432626]  ? process_scheduled_works+0x4ca/0x1ac0
[  318.432650]  ? lock_release+0xc9/0x2a0
[  318.432674]  process_scheduled_works+0x553/0x1ac0
[  318.432704]  ? insert_work+0x190/0x190
[  318.432723]  ? write_comp_data+0x1f/0x70
[  318.432741]  ? __sanitizer_cov_trace_pc+0x1a/0x40
[  318.432761]  ? lock_is_held_type+0x9b/0x110
[  318.432784]  ? __sanitizer_cov_trace_pc+0x1a/0x40
[  318.432805]  worker_thread+0x5a9/0xd10
[  318.432830]  ? bh_worker+0x740/0x740
[  318.435937]  kthread+0x3f9/0x530
[  318.435991]  ? kthread_affine_node+0x2a0/0x2a0
[  318.436060]  ret_from_fork+0x666/0xab0
[  318.436098]  ? native_tss_update_io_bitmap+0x6c0/0x6c0
[  318.436139]  ? write_comp_data+0x1f/0x70
[  318.436175]  ? __sanitizer_cov_trace_pc+0x1a/0x40
[  318.436213]  ? __switch_to+0x854/0x1270
[  318.436255]  ? kthread_affine_node+0x2a0/0x2a0
[  318.436301]  ret_from_fork_asm+0x11/0x20
[  318.436360]  </TASK>
[  318.436373] INFO: task kworker/0:1:11 blocked for more than 143 seconds.
[  318.436393]       Not tainted 7.0.0-rc1-00301-g4d349ee5c778 #1
[  318.436411] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  318.436422] task:kworker/0:1     state:D stack:26592 pid:11
tgid:11    ppid:2      task_flags:0x4208060 flags:0x00080000
[  318.436528] Workqueue: rcu_gp process_srcu
[  318.436571] Call Trace:
[  318.436580]  <TASK>
[  318.436595]  __schedule+0x1190/0x4190
[  318.436639]  ? io_schedule_timeout+0x80/0x80
[  318.436676]  ? lock_release+0xc9/0x2a0
[  318.436726]  schedule+0xd1/0x260
[  318.436759]  schedule_timeout+0x240/0x280
[  318.436804]  ? hrtimer_nanosleep_restart+0x340/0x340
[  318.522895]  ? mark_held_locks+0x49/0x70
[  318.522930]  ? lockdep_hardirqs_on_prepare+0xd7/0x170
[  318.522952]  ? _raw_spin_unlock_irq+0x23/0x40
[  318.522977]  ? trace_hardirqs_on+0x18/0x170
[  318.523008]  wait_for_completion+0x169/0x320
[  318.523029]  ? wait_for_completion_killable+0x410/0x410
[  318.523047]  ? mark_held_locks+0x49/0x70
[  318.523068]  ? lockdep_hardirqs_on_prepare+0xd7/0x170
[  318.523088]  ? _raw_spin_unlock_irqrestore+0x2c/0x50
[  318.523110]  ? trace_hardirqs_on+0x18/0x170
[  318.523128]  ? _raw_spin_unlock_irqrestore+0x2c/0x50
[  318.523152]  synchronize_rcu_normal+0x208/0x5f0
[  318.523171]  ? start_poll_synchronize_rcu_full+0x90/0x90
[  318.523190]  ? do_raw_spin_lock+0x123/0x290
[  318.523219]  ? _raw_spin_unlock_irqrestore+0x2c/0x50
[  318.523242]  ? rt_mutex_slowunlock+0x824/0xaf0
[  318.523264]  ? lock_is_held_type+0x9b/0x110
[  318.523294]  try_check_zero+0x429/0x630
[  318.523321]  process_srcu+0x4c1/0x16c0
[  318.523346]  ? lock_acquire+0x187/0x2e0
[  318.523365]  ? process_scheduled_works+0x4ca/0x1ac0
[  318.523389]  ? lock_release+0xc9/0x2a0
[  318.523412]  process_scheduled_works+0x553/0x1ac0
[  318.523442]  ? insert_work+0x190/0x190
[  318.523461]  ? write_comp_data+0x1f/0x70
[  318.523478]  ? __sanitizer_cov_trace_pc+0x1a/0x40
[  318.523498]  ? lock_is_held_type+0x9b/0x110
[  318.523521]  ? __sanitizer_cov_trace_pc+0x1a/0x40
[  318.523541]  worker_thread+0x5a9/0xd10
[  318.523566]  ? bh_worker+0x740/0x740
[  318.523587]  kthread+0x3f9/0x530
[  318.523605]  ? kthread_affine_node+0x2a0/0x2a0
[  318.523627]  ret_from_fork+0x666/0xab0
[  318.523645]  ? native_tss_update_io_bitmap+0x6c0/0x6c0
[  318.523664]  ? write_comp_data+0x1f/0x70
[  318.523680]  ? __sanitizer_cov_trace_pc+0x1a/0x40
[  318.523698]  ? __switch_to+0x854/0x1270
[  318.523717]  ? kthread_affine_node+0x2a0/0x2a0
[  318.523738]  ret_from_fork_asm+0x11/0x20
[  318.523764]  </TASK>
[  318.526244] INFO: task repro2:4086 blocked for more than 143 seconds.
[  318.526258]       Not tainted 7.0.0-rc1-00301-g4d349ee5c778 #1
[  318.526267] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  318.526273] task:repro2          state:D stack:25512 pid:4086
tgid:4086  ppid:298    task_flags:0x400040 flags:0x00080002
executing program[  318.526322] Call Trace:
[  318.526327]  <TASK>

[  318.526334]  __schedule+0x1190/0x4190
[  318.526356]  ? io_schedule_timeout+0x80/0x80
[  318.526373]  ? lock_release+0xc9/0x2a0
[  318.526396]  schedule+0xd1/0x260
[  318.526411]  schedule_timeout+0x240/0x280
[  318.526432]  ? hrtimer_nanosleep_restart+0x340/0x340
[  318.526458]  ? mark_held_locks+0x49/0x70
[  318.526478]  ? lockdep_hardirqs_on_prepare+0xd7/0x170
[  318.526499]  ? _raw_spin_unlock_irq+0x23/0x40
[  318.526520]  ? trace_hardirqs_on+0x18/0x170
[  318.526539]  wait_for_completion+0x169/0x320
[  318.526556]  ? __lock_acquire+0x55a/0x1ef0
[  318.526577]  ? wait_for_completion_killable+0x410/0x410
[  318.526597]  ? lockdep_init_map_type+0x4b/0x210
[  318.526621]  __synchronize_srcu+0x143/0x230
[  318.526643]  ? start_poll_synchronize_srcu+0x10/0x10
[  318.526666]  ? rcu_tasks_pregp_step+0x10/0x10
[  318.526687]  ? kvm_clock_read+0x3b/0x60
[  318.526707]  ? write_comp_data+0x1f/0x70
[  318.526723]  ? __sanitizer_cov_trace_pc+0x1a/0x40
[  318.526741]  ? ktime_get_mono_fast_ns+0x19d/0x2b0
[  318.526766]  ? synchronize_srcu+0x53/0x260
[  318.526789]  perf_trace_event_unreg.isra.0+0xb8/0x1e0
[  318.526807]  perf_trace_destroy+0xc3/0x1c0
[  318.526822]  ? perf_tp_event_init+0x120/0x120
[  318.529171]  __free_event+0x257/0xc10
[  318.529200]  ? perf_event_release_kernel+0x460/0x460
[  318.529215]  put_event+0x3c/0x90
[  318.529230]  perf_event_release_kernel+0x357/0x460
[  318.529248]  ? perf_event_release_kernel+0x460/0x460
[  318.529263]  perf_release+0x37/0x50
[  318.529277]  __fput+0x420/0xb80
[  318.529303]  fput_close_sync+0x10f/0x230
[  318.529325]  ? alloc_file_clone+0x110/0x110
[  318.529345]  ? dnotify_flush+0x7f/0x4c0
[  318.529372]  __x64_sys_close+0x8f/0x120
[  318.529390]  do_syscall_64+0x115/0x690
[  318.529410]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
[  318.529425] RIP: 0033:0x7fdfffcd611b
[  318.529436] RSP: 002b:00007fffb7d290d0 EFLAGS: 00000293 ORIG_RAX:
0000000000000003
[  318.529450] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fdfffcd611b
[  318.529460] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000005
[  318.529469] RBP: 00007fffb7d29100 R08: 0000000000000000 R09: 0000000000000032
[  318.529478] R10: 0000000000000000 R11: 0000000000000293 R12: 000055b24d16e3f0
[  318.529486] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[  318.529503]  </TASK>


Could you please advise whether this is a known issue/regression, and
what additional debug data would be most helpful? I can rerun with
extra debugging options if needed.

Thanks,
Zw Tang

Reply via email to