Bug#1087981: linux-image-6.1.0-27-amd64: detected stalls in kernel log, system very slow on IO (regression)

Salvatore Bonaccorso Thu, 21 Nov 2024 08:08:01 -0800

hi,

On Thu, Nov 21, 2024 at 12:13:39PM +0100, Willi Mann wrote:
> Package: src:linux
> Version: 6.1.115-1
> Severity: important
> Tags: upstream
> 
> Hi,
> 
> I'm seeing regularly kernel logs of the form below, combined with the system
> feeling terribly slow. This is a regression as I didn't have these problems
> with linux-image-6.1.0-26-amd64. The issue occurs on a VPS.
> 
> The problems seem to be fixed with the linux-image-6.1.0-28-amd64-unsigned
> 6.1.118-1~test1 from the repo at
> https://people.debian.org/~carnil/tmp/linux/1086447/ (i.e., with the affected
> version, everything is terribly slow from the beginning while the previous
> version and test version don't show this symptom + nothing related yet in the
> kernel log).
> 
> I thought about adding my report to #1087602 as it is also a regression from
> linux-image-6.1.0-26-amd64 on virtualized hardware and sounds related, but I
> decided to file as a different bug because the backtrace look different.
> 
> thanks
> Willi
> 
> 
> [462022.185732] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> [462022.186313] rcu:    0-...0: (1 GPs behind) idle=313c/1/0x4000000000000000 
> softirq=1652560/1652574 fqs=976
> [462022.186876]         (detected by 1, t=5255 jiffies, g=2935025, q=288 
> ncpus=2)
> [462022.186880] Sending NMI from CPU 1 to CPUs 0:
> [462037.911283] NMI backtrace for cpu 0
> [462037.911288] CPU: 0 PID: 12664 Comm: systemd-journal Tainted: G            
>  L     6.1.0-27-amd64 #1  Debian 6.1.115-1
> [462037.911292] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
> [462037.911293] RIP: 0010:asm_sysvec_reschedule_ipi+0x0/0x20
> [462037.911301] Code: e9 15 07 00 00 0f 1f 44 00 00 0f 01 ca fc 6a ff e8 d5 
> 05 00 00 48 89 c4 48 89 e7 e8 fa f9 e2 ff e9 f5 06 00 00 0f 1f 44 00 00 <0f> 
> 01 ca fc 6a ff e8 b5 05 00 00 48 89 c4 48 89 e7 e8 fa 1b e3 ff
> [462037.911302] RSP: 0000:fffffe0000002fd8 EFLAGS: 00000002
> [462037.911304] RAX: 0000000000000001 RBX: 0000000000fbafb0 RCX: 
> 0000000000fbafb0
> [462037.911306] RDX: 0000000000000006 RSI: 00007efd91dbafb0 RDI: 
> 000055c938078790
> [462037.911306] RBP: 000055c938078790 R08: 000055c938078970 R09: 
> 000055c9380787c8
> [462037.911307] R10: 0000000000000002 R11: 00007efd93d6f288 R12: 
> 00007ffecfbc6918
> [462037.911308] R13: 00007ffecfbc6890 R14: 0000000000000006 R15: 
> 00007efd91dbafb0
> [462037.911311] FS:  00007efd9573b5c0(0000) GS:ffff9201f9c00000(0000) 
> knlGS:ffff9201f9c00000
> [462037.911312] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [462037.911313] CR2: 00007efd93d6f1f4 CR3: 00000001022b8000 CR4: 
> 00000000003506f0
> [462037.911316] Call Trace:
> [462060.884093] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> [462060.884626] rcu:    1-...!: (1 ticks this GP) 
> idle=065c/1/0x4000000000000000 softirq=1599519/1599519 fqs=18
> [462060.885310]         (detected by 0, t=5742 jiffies, g=2935029, q=224 
> ncpus=2)
> [462060.885314] Sending NMI from CPU 0 to CPUs 1:
> [462073.753390] NMI backtrace for cpu 1
> [462073.753396] CPU: 1 PID: 85627 Comm: kworker/1:1 Tainted: G             L  
>    6.1.0-27-amd64 #1  Debian 6.1.115-1
> [462073.753401] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
> [462073.753403] Workqueue: events drm_fb_helper_damage_work [drm_kms_helper]
> [462073.753421] RIP: 0010:asm_sysvec_apic_timer_interrupt+0x0/0x20
> [462073.753427] Code: e9 55 07 00 00 0f 1f 44 00 00 0f 01 ca fc 6a ff e8 15 
> 06 00 00 48 89 c4 48 89 e7 e8 ba 1f e3 ff e9 35 07 00 00 0f 1f 44 00 00 <0f> 
> 01 ca fc 6a ff e8 f5 05 00 00 48 89 c4 48 89 e7 e8 6a 1e e3 ff
> [462073.753429] RSP: 0018:ffffbc1a46d5fdc8 EFLAGS: 00000016
> [462073.753430] RAX: 0000000000000000 RBX: 0000000000001400 RCX: 
> 0000000000000200
> [462073.753431] RDX: 0000000000000000 RSI: ffffbc1a40829000 RDI: 
> ffffbc1a4240c000
> [462073.753432] RBP: ffffbc1a4240b400 R08: ffff920162cfd848 R09: 
> ffff920147dc1074
> [462073.753433] R10: 0000000000000007 R11: 0000000000000002 R12: 
> ffffbc1a40828400
> [462073.753433] R13: ffffbc1a40828400 R14: ffffbc1a4240b400 R15: 
> ffff920144f7bcd0
> [462073.753437] FS:  0000000000000000(0000) GS:ffff9201f9d00000(0000) 
> knlGS:0000000000000000
> [462073.753438] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [462073.753439] CR2: 00007f85dd5ab0d8 CR3: 0000000122694000 CR4: 
> 00000000003506e0
> [462073.753442] Call Trace:
> [462073.753444]  <TASK>
> [462073.753445] RIP: 0010:memcpy_toio+0x6d/0xb0
> [462073.753448] Code: 5c e9 b7 ce 88 00 48 85 db 74 f2 40 f6 c5 01 75 45 48 
> 83 fb 01 76 06 40 f6 c5 02 75 25 48 89 d9 48 89 ef 4c 89 e6 48 c1 e9 02 <f3> 
> a5 f6 c3 02 74 02 66 a5 f6 c3 01 74 01 a4 5b 5d 41 5c e9 7b ce
> [462073.753449] RSP: 0018:ffffbc1a46d5fdf8 EFLAGS: 00000216
> [462073.753452]  ? drm_fb_helper_damage_work+0x2df/0x3a0 [drm_kms_helper]
> [462073.753463]  ? process_one_work+0x1c7/0x380
> [462073.753467]  ? worker_thread+0x4d/0x380
> [462073.753469]  ? rescuer_thread+0x3a0/0x3a0
> [462073.753471]  ? kthread+0xda/0x100
> [462073.753473]  ? kthread_complete_and_exit+0x20/0x20
> [462073.753475]  ? ret_from_fork+0x22/0x30
> [462073.753479]  </TASK>
> [462073.753757] rcu: rcu_preempt kthread timer wakeup didn't happen for 5649 
> jiffies! g2935029 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
> [462073.754323] rcu:    Possible timer handling issue on cpu=0 
> timer-softirq=4828143
> [462073.754819] rcu: rcu_preempt kthread starved for 5650 jiffies! g2935029 
> f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0
> [462073.755256] rcu:    Unless rcu_preempt kthread gets sufficient CPU time, 
> OOM is now expected behavior.
> [462073.755759] rcu: RCU grace-period kthread stack dump:
> [462073.756240] task:rcu_preempt     state:I stack:0     pid:15    ppid:2     
>  flags:0x00004000
> [462073.756245] Call Trace:
> [462073.756247]  <TASK>
> [462073.756251]  __schedule+0x34d/0x9e0


Can you please provide a full boot log to see if we can pin-point the
cause?

Regards,
Salvatore

Bug#1087981: linux-image-6.1.0-27-amd64: detected stalls in kernel log, system very slow on IO (regression)

Reply via email to