Patch series sent to ML: https://lists.ubuntu.com/archives/kernel- team/2024-December/155884.html
SRU Justification [Impact] Since upstream commit c8bd1f7f3e61 ("virtio_net: add support for Byte Queue Limits"), BQL crashes have been observed. One crash pattern was addressed by upstream commit b96ed2c97c79 ("virtio_net: move netdev_tx_reset_queue() call before RX napi enable"), but other patterns remain unresolved. Most notably, on GCP instances, frequent boot test failures with BQL crashes have been found. To end users, this issue typically appears as extremely slow instance boot time. Even when booting is successful, the instance remains susceptible to kernel panics under certain conditions. So this issue needs to be resolved. Only Oracular is affected. [Fix] The issue is resolved by the following patch series: https://lore.kernel.org/all/20241206011047.923923-1-koichiro....@canonical.com/ 6.11.y branch is now EOL, so the patch series will not land in upstream stable tree. [Test Plan] Reboot some GCP instances with the fix applied multiple times to verify that the BQL crash no longer occurs. [Where problems could occur] The fix impacts only virtio-net, so any regression would appear as unexpected behavior in virtio-net or potentially cause a kernel crash. [Other Info] [PATCH 1/3] resolves the issue observed on GCP. [PATCH 2/3] + [PATCH 3/3] resolve similar issues which have not been observed on our testing infrastructure but still worth applying to prevent potential kernel panic due to BQL crash. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2089684 Title: oracular: ubuntu_boot lib/dynamic_queue_limits.c:99! Status in linux package in Ubuntu: Triaged Status in linux source package in Oracular: Triaged Status in linux source package in Plucky: Triaged Bug description: We observed this in GCP cloud where boot test cases pass but it prints a stack trace afterwards due to an error in virtio_net. [ 9.326748] cloud-init[617]: Cloud-init v. 24.4~3+really24.3.1-0ubuntu4 running 'init-local' at Tue, 26 Nov 2024 12:47:39 +0000. Up 9.28 seconds. [ 10.470815] kernel BUG at lib/dynamic_queue_limits.c:99! [ 10.476289] Oops: invalid opcode: 0000 [#1] SMP PTI [ 10.481279] CPU: 0 UID: 0 PID: 644 Comm: ip Not tainted 6.11.0-1005-gcp #5-Ubuntu [ 10.488887] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024 [ 10.498742] RIP: 0010:dql_completed+0x191/0x1b0 [ 10.503629] Code: 63 ce 01 48 89 47 58 e9 03 ff ff ff 45 85 e4 41 0f 95 c3 39 d9 0f 95 c1 41 84 cb 74 05 45 85 ed 78 0a 44 89 d1 e9 e5 fe ff ff <0f> 0b 01 c0 44 89 d1 29 c1 b8 00 00 00 00 0f 48 c8 eb 84 66 66 2e [ 10.522602] RSP: 0018:ffffab5f00003cb0 EFLAGS: 00010297 [ 10.528032] RAX: 0000000000000036 RBX: ffffab5f00003d10 RCX: 0000000000000000 [ 10.535541] RDX: 0000000000000000 RSI: 0000000000000036 RDI: ffff97d445b28d00 [ 10.542876] RBP: ffffab5f00003d00 R08: 0000000000000000 R09: 0000000000000000 [ 10.550384] R10: ffff97d462c00000 R11: ffffffffbc0060c0 R12: ffff97d441aec000 [ 10.557875] R13: ffffab5f00003ccc R14: ffff97d445b28c00 R15: ffff97d446ac0a00 [ 10.565206] FS: 0000793839e71800(0000) GS:ffff97d462c00000(0000) knlGS:0000000000000000 [ 10.573458] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 10.579408] CR2: 00005fee8b709198 CR3: 0000000003e78006 CR4: 00000000003706f0 [ 10.586663] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 10.593999] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 10.601350] Call Trace: [ 10.604094] <IRQ> [ 10.606211] ? show_trace_log_lvl+0x1be/0x310 [ 10.611074] ? show_trace_log_lvl+0x1be/0x310 [ 10.615631] ? free_old_xmit+0x4a/0xa0 [ 10.619633] ? show_regs.part.0+0x22/0x30 [ 10.623774] ? __die_body.cold+0x8/0x10 [ 10.627735] ? __die+0x2a/0x40 [ 10.630904] ? die+0x2f/0x60 [ 10.633898] ? do_trap+0xc8/0x110 [ 10.637675] ? do_error_trap+0x71/0xb0 [ 10.641811] ? dql_completed+0x191/0x1b0 [ 10.645854] ? exc_invalid_op+0x52/0x80 [ 10.649896] ? dql_completed+0x191/0x1b0 [ 10.654011] ? asm_exc_invalid_op+0x1b/0x20 [ 10.658325] ? dql_completed+0x191/0x1b0 [ 10.662456] ? __free_old_xmit+0xe1/0x160 [ 10.666569] free_old_xmit+0x4a/0xa0 [ 10.670339] virtnet_poll_cleantx.isra.0+0xca/0x130 [ 10.675368] virtnet_poll+0x5d/0x610 [ 10.679049] ? __enqueue_entity+0x10b/0x150 [ 10.683438] ? enqueue_entity+0xde/0x530 [ 10.687489] __napi_poll+0x30/0x190 [ 10.691143] net_rx_action+0x212/0x410 [ 10.695002] handle_softirqs+0xe7/0x310 [ 10.699206] __do_softirq+0x10/0x18 [ 10.703163] do_softirq.part.0+0x3f/0x80 [ 10.707211] </IRQ> [ 10.709419] <TASK> [ 10.711639] __local_bh_enable_ip+0x4e/0x50 [ 10.716069] virtnet_open+0x108/0x360 [ 10.719878] __dev_open+0x109/0x1d0 [ 10.723580] __dev_change_flags+0x1d8/0x230 [ 10.727976] dev_change_flags+0x27/0x80 [ 10.731942] do_setlink+0x39e/0xd90 [ 10.735777] ? rtnetlink_rcv_msg+0x2e8/0x440 [ 10.740156] ? __nla_validate_parse+0x49/0x1b0 [ 10.744717] __rtnl_newlink+0x5c8/0x770 [ 10.748773] rtnl_newlink+0x77/0xa0 [ 10.752378] rtnetlink_rcv_msg+0x2d5/0x440 [ 10.756585] ? __pfx_rtnetlink_rcv_msg+0x10/0x10 [ 10.761479] netlink_rcv_skb+0x52/0x100 [ 10.765484] rtnetlink_rcv+0x15/0x30 [ 10.769180] netlink_unicast+0x226/0x350 [ 10.773253] netlink_sendmsg+0x214/0x460 [ 10.777296] ____sys_sendmsg+0x3b1/0x3f0 [ 10.781404] ___sys_sendmsg+0x9a/0xf0 [ 10.785279] __sys_sendmsg+0xe5/0x120 [ 10.789111] __x64_sys_sendmsg+0x1d/0x30 [ 10.793339] x64_sys_call+0x7da/0x22b0 [ 10.797375] do_syscall_64+0x7e/0x170 [ 10.801147] ? __count_memcg_events+0x86/0x160 [ 10.805756] ? count_memcg_events.constprop.0+0x2a/0x50 [ 10.811157] ? handle_mm_fault+0x1b1/0x2d0 [ 10.816001] ? do_user_addr_fault+0x5af/0x7b0 [ 10.820939] ? irqentry_exit_to_user_mode+0x43/0x250 [ 10.826117] ? irqentry_exit+0x21/0x40 [ 10.830026] ? clear_bhb_loop+0x15/0x70 [ 10.833979] ? clear_bhb_loop+0x15/0x70 [ 10.838102] ? clear_bhb_loop+0x15/0x70 [ 10.842078] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 10.847339] RIP: 0033:0x793839d35e14 [ 10.851112] Code: 15 11 b0 0d 00 f7 d8 64 89 02 b8 ff ff ff ff eb bf 0f 1f 44 00 00 f3 0f 1e fa 80 3d 55 32 0e 00 00 74 13 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 4c c3 0f 1f 00 55 48 89 e5 48 83 ec 20 89 55 [ 10.870079] RSP: 002b:00007ffcd6793058 EFLAGS: 00000202 ORIG_RAX: 000000000000002e [ 10.877956] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000793839d35e14 [ 10.885291] RDX: 0000000000000000 RSI: 00007ffcd67930c0 RDI: 0000000000000003 [ 10.892532] RBP: 00007ffcd6793130 R08: 0000000000000010 R09: 0000000000000001 [ 10.899775] R10: 00005fee9849a960 R11: 0000000000000202 R12: 0000000000000003 [ 10.907645] R13: 000000006745c36d R14: 00005fee8b6b8040 R15: 0000000000000000 [ 10.915603] </TASK> [ 10.917898] Modules linked in: 8021q garp mrp stp llc binfmt_misc nls_iso8859_1 input_leds serio_raw sch_fq_codel nvme_fabrics efi_pstore dm_multipath vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock vmw_vmci dmi_sysfs ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid1 raid0 crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel sha256_ssse3 sha1_ssse3 psmouse virtio_rng aesni_intel crypto_simd cryptd [ 10.967222] ---[ end trace 0000000000000000 ]--- [ 11.096808] RIP: 0010:dql_completed+0x191/0x1b0 [ 11.101488] Code: 63 ce 01 48 89 47 58 e9 03 ff ff ff 45 85 e4 41 0f 95 c3 39 d9 0f 95 c1 41 84 cb 74 05 45 85 ed 78 0a 44 89 d1 e9 e5 fe ff ff <0f> 0b 01 c0 44 89 d1 29 c1 b8 00 00 00 00 0f 48 c8 eb 84 66 66 2e [ 11.120410] RSP: 0018:ffffab5f00003cb0 EFLAGS: 00010297 [ 11.126281] RAX: 0000000000000036 RBX: ffffab5f00003d10 RCX: 0000000000000000 [ 11.133646] RDX: 0000000000000000 RSI: 0000000000000036 RDI: ffff97d445b28d00 [ 11.140901] RBP: ffffab5f00003d00 R08: 0000000000000000 R09: 0000000000000000 [ 11.148149] R10: ffff97d462c00000 R11: ffffffffbc0060c0 R12: ffff97d441aec000 [ 11.155401] R13: ffffab5f00003ccc R14: ffff97d445b28c00 R15: ffff97d446ac0a00 [ 11.162735] FS: 0000793839e71800(0000) GS:ffff97d462c00000(0000) knlGS:0000000000000000 [ 11.171458] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 11.177315] CR2: 00005fee8b709198 CR3: 0000000003e78006 CR4: 00000000003706f0 [ 11.184565] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 11.192370] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 11.199664] Kernel panic - not syncing: Fatal exception in interrupt [ 11.206258] Kernel Offset: 0x38a00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 11.309674] Rebooting in 10 seconds.. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2089684/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp