Public bug reported:
QEMU processes stuck on io_uring lock in Ubuntu 24.04, on kernel
6.8.0-56.
Since 2 weeks, im migrating more hosts towards ubuntu 24.04, coming from
22.04. Since then I notice the occasional VM that gets stuck in proc D
state. dmesg then shows the same Call Trace as pasted below.
On ubuntu 22.04 I was running the hwe package with kernel versions 6.5
and 6.8, although I wasn't running 6.8 as much as I am doing now.
I did find a locking patch in the 6.8.0-56 changelog and was wondering if that
could be the cause:
+
+ /*
+ * For silly syzbot cases that deliberately overflow by huge
+ * amounts, check if we need to resched and drop and
+ * reacquire the locks if so. Nothing real would ever hit this.
+ * Ideally we'd have a non-posting unlock for this, but hard
+ * to care for a non-real case.
+ */
+ if (need_resched()) {
+ io_cq_unlock_post(ctx);
+ mutex_unlock(&ctx->uring_lock);
+ cond_resched();
+ mutex_lock(&ctx->uring_lock);
+ io_cq_lock(ctx);
+ }
/proc/cmdline: BOOT_IMAGE=/boot/vmlinuz-6.8.0-56-generic
root=/dev/mapper/hv9-root ro verbose security=apparmor rootdelay=10
max_loop=16 default_hugepagesz=1G hugepagesz=1G hugepages=448
libata.force=noncq iommu=pt crashkernel=512M-4G:128M,4G-8G:256M,8G-:512M
dmesg snippet:
[Thu Mar 27 18:50:48 2025] INFO: task qemu-system-x86:15480 blocked for more
than 552 seconds.
[Thu Mar 27 18:50:48 2025] Tainted: G OE 6.8.0-56-generic
#58-Ubuntu
[Thu Mar 27 18:50:48 2025] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[Thu Mar 27 18:50:48 2025] task:qemu-system-x86 state:D stack:0 pid:15480
tgid:15480 ppid:1 flags:0x00024006
[Thu Mar 27 18:50:48 2025] Call Trace:
[Thu Mar 27 18:50:48 2025] <TASK>
[Thu Mar 27 18:50:48 2025] __schedule+0x27c/0x6b0
[Thu Mar 27 18:50:48 2025] schedule+0x33/0x110
[Thu Mar 27 18:50:48 2025] schedule_preempt_disabled+0x15/0x30
[Thu Mar 27 18:50:48 2025] __mutex_lock.constprop.0+0x42f/0x740
[Thu Mar 27 18:50:48 2025] ? srso_return_thunk+0x5/0x5f
[Thu Mar 27 18:50:48 2025] __mutex_lock_slowpath+0x13/0x20
[Thu Mar 27 18:50:48 2025] mutex_lock+0x3c/0x50
[Thu Mar 27 18:50:48 2025] __do_sys_io_uring_enter+0x2e7/0x4a0
[Thu Mar 27 18:50:48 2025] __x64_sys_io_uring_enter+0x22/0x40
[Thu Mar 27 18:50:48 2025] x64_sys_call+0xeda/0x25a0
[Thu Mar 27 18:50:48 2025] do_syscall_64+0x7f/0x180
[Thu Mar 27 18:50:48 2025] ? srso_return_thunk+0x5/0x5f
[Thu Mar 27 18:50:48 2025] ? syscall_exit_to_user_mode+0x86/0x260
[Thu Mar 27 18:50:48 2025] ? srso_return_thunk+0x5/0x5f
[Thu Mar 27 18:50:48 2025] ? do_syscall_64+0x8c/0x180
[Thu Mar 27 18:50:48 2025] ? srso_return_thunk+0x5/0x5f
[Thu Mar 27 18:50:48 2025] ? syscall_exit_to_user_mode+0x86/0x260
[Thu Mar 27 18:50:48 2025] ? srso_return_thunk+0x5/0x5f
[Thu Mar 27 18:50:48 2025] ? do_syscall_64+0x8c/0x180
[Thu Mar 27 18:50:48 2025] ? srso_return_thunk+0x5/0x5f
[Thu Mar 27 18:50:48 2025] ? __x64_sys_ioctl+0xbb/0xf0
[Thu Mar 27 18:50:48 2025] ? srso_return_thunk+0x5/0x5f
[Thu Mar 27 18:50:48 2025] ? __x64_sys_ioctl+0xbb/0xf0
[Thu Mar 27 18:50:48 2025] ? srso_return_thunk+0x5/0x5f
[Thu Mar 27 18:50:48 2025] ? syscall_exit_to_user_mode+0x86/0x260
[Thu Mar 27 18:50:48 2025] ? srso_return_thunk+0x5/0x5f
[Thu Mar 27 18:50:48 2025] ? __x64_sys_ioctl+0xbb/0xf0
[Thu Mar 27 18:50:48 2025] ? srso_return_thunk+0x5/0x5f
[Thu Mar 27 18:50:48 2025] ? syscall_exit_to_user_mode+0x86/0x260
[Thu Mar 27 18:50:48 2025] ? srso_return_thunk+0x5/0x5f
[Thu Mar 27 18:50:48 2025] ? do_syscall_64+0x8c/0x180
[Thu Mar 27 18:50:48 2025] ? irqentry_exit+0x43/0x50
[Thu Mar 27 18:50:48 2025] ? srso_return_thunk+0x5/0x5f
[Thu Mar 27 18:50:48 2025] entry_SYSCALL_64_after_hwframe+0x78/0x80
At this moment I have not tried to reproduce this yet, I can try running
fio on a test host with the same kernel to see if I can consistently
break it.
I also have a crash dump that I made of one of the hosts.
** Affects: linux (Ubuntu)
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2105471
Title:
io_uring process deadlock
Status in linux package in Ubuntu:
New
Bug description:
QEMU processes stuck on io_uring lock in Ubuntu 24.04, on kernel
6.8.0-56.
Since 2 weeks, im migrating more hosts towards ubuntu 24.04, coming
from 22.04. Since then I notice the occasional VM that gets stuck in
proc D state. dmesg then shows the same Call Trace as pasted below.
On ubuntu 22.04 I was running the hwe package with kernel versions 6.5
and 6.8, although I wasn't running 6.8 as much as I am doing now.
I did find a locking patch in the 6.8.0-56 changelog and was wondering if
that could be the cause:
+
+ /*
+ * For silly syzbot cases that deliberately overflow by huge
+ * amounts, check if we need to resched and drop and
+ * reacquire the locks if so. Nothing real would ever hit this.
+ * Ideally we'd have a non-posting unlock for this, but hard
+ * to care for a non-real case.
+ */
+ if (need_resched()) {
+ io_cq_unlock_post(ctx);
+ mutex_unlock(&ctx->uring_lock);
+ cond_resched();
+ mutex_lock(&ctx->uring_lock);
+ io_cq_lock(ctx);
+ }
/proc/cmdline: BOOT_IMAGE=/boot/vmlinuz-6.8.0-56-generic
root=/dev/mapper/hv9-root ro verbose security=apparmor rootdelay=10
max_loop=16 default_hugepagesz=1G hugepagesz=1G hugepages=448
libata.force=noncq iommu=pt
crashkernel=512M-4G:128M,4G-8G:256M,8G-:512M
dmesg snippet:
[Thu Mar 27 18:50:48 2025] INFO: task qemu-system-x86:15480 blocked for more
than 552 seconds.
[Thu Mar 27 18:50:48 2025] Tainted: G OE
6.8.0-56-generic #58-Ubuntu
[Thu Mar 27 18:50:48 2025] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[Thu Mar 27 18:50:48 2025] task:qemu-system-x86 state:D stack:0 pid:15480
tgid:15480 ppid:1 flags:0x00024006
[Thu Mar 27 18:50:48 2025] Call Trace:
[Thu Mar 27 18:50:48 2025] <TASK>
[Thu Mar 27 18:50:48 2025] __schedule+0x27c/0x6b0
[Thu Mar 27 18:50:48 2025] schedule+0x33/0x110
[Thu Mar 27 18:50:48 2025] schedule_preempt_disabled+0x15/0x30
[Thu Mar 27 18:50:48 2025] __mutex_lock.constprop.0+0x42f/0x740
[Thu Mar 27 18:50:48 2025] ? srso_return_thunk+0x5/0x5f
[Thu Mar 27 18:50:48 2025] __mutex_lock_slowpath+0x13/0x20
[Thu Mar 27 18:50:48 2025] mutex_lock+0x3c/0x50
[Thu Mar 27 18:50:48 2025] __do_sys_io_uring_enter+0x2e7/0x4a0
[Thu Mar 27 18:50:48 2025] __x64_sys_io_uring_enter+0x22/0x40
[Thu Mar 27 18:50:48 2025] x64_sys_call+0xeda/0x25a0
[Thu Mar 27 18:50:48 2025] do_syscall_64+0x7f/0x180
[Thu Mar 27 18:50:48 2025] ? srso_return_thunk+0x5/0x5f
[Thu Mar 27 18:50:48 2025] ? syscall_exit_to_user_mode+0x86/0x260
[Thu Mar 27 18:50:48 2025] ? srso_return_thunk+0x5/0x5f
[Thu Mar 27 18:50:48 2025] ? do_syscall_64+0x8c/0x180
[Thu Mar 27 18:50:48 2025] ? srso_return_thunk+0x5/0x5f
[Thu Mar 27 18:50:48 2025] ? syscall_exit_to_user_mode+0x86/0x260
[Thu Mar 27 18:50:48 2025] ? srso_return_thunk+0x5/0x5f
[Thu Mar 27 18:50:48 2025] ? do_syscall_64+0x8c/0x180
[Thu Mar 27 18:50:48 2025] ? srso_return_thunk+0x5/0x5f
[Thu Mar 27 18:50:48 2025] ? __x64_sys_ioctl+0xbb/0xf0
[Thu Mar 27 18:50:48 2025] ? srso_return_thunk+0x5/0x5f
[Thu Mar 27 18:50:48 2025] ? __x64_sys_ioctl+0xbb/0xf0
[Thu Mar 27 18:50:48 2025] ? srso_return_thunk+0x5/0x5f
[Thu Mar 27 18:50:48 2025] ? syscall_exit_to_user_mode+0x86/0x260
[Thu Mar 27 18:50:48 2025] ? srso_return_thunk+0x5/0x5f
[Thu Mar 27 18:50:48 2025] ? __x64_sys_ioctl+0xbb/0xf0
[Thu Mar 27 18:50:48 2025] ? srso_return_thunk+0x5/0x5f
[Thu Mar 27 18:50:48 2025] ? syscall_exit_to_user_mode+0x86/0x260
[Thu Mar 27 18:50:48 2025] ? srso_return_thunk+0x5/0x5f
[Thu Mar 27 18:50:48 2025] ? do_syscall_64+0x8c/0x180
[Thu Mar 27 18:50:48 2025] ? irqentry_exit+0x43/0x50
[Thu Mar 27 18:50:48 2025] ? srso_return_thunk+0x5/0x5f
[Thu Mar 27 18:50:48 2025] entry_SYSCALL_64_after_hwframe+0x78/0x80
At this moment I have not tried to reproduce this yet, I can try
running fio on a test host with the same kernel to see if I can
consistently break it.
I also have a crash dump that I made of one of the hosts.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2105471/+subscriptions
--
Mailing list: https://launchpad.net/~kernel-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp