MAAS deploy/release loop with focal[1] on d05-3 and has deployed for 82 times without failure. MAAS deploy/release loop with bionic-hwe on appleton run 100 times and 10 of them are failed.
Look like this issue is only happened on appleton. -- [1] For some reason I can not deploy bionic-hwe with d05-3. Working on it. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1953058 Title: Kernel "BUG: soft lockup" with 5.4 kernels on arm64 node appleton node (dmesg spammed with "mlx5_core 0005:01:00.0: mlx5_eq_comp_int:159:(pid 1180): Completion event for bogus CQ 0x5a5aa9") Status in ubuntu-kernel-tests: New Status in linux package in Ubuntu: Fix Released Status in linux source package in Focal: Confirmed Bug description: The regression boot test running with bionic:linux-hwe-5.4 5.4.0-92.103~18.04.2 failed because of the following hung task: Dec 2 12:17:12 appleton-kernel kernel: [ 64.281447] watchdog: BUG: soft lockup - CPU#16 stuck for 22s! [swapper/16:0] Dec 2 12:17:12 appleton-kernel kernel: [ 64.288573] Modules linked in: ipmi_ssif nls_iso8859_1 joydev input_leds ipmi_si ipmi_devintf ipmi_msghandler sch_fq_codel ib_iser rdma_cm iw_cm ib_cm iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 multipath linear mlx5_ib hid_generic ses usbhid enclosure hid ib_uverbs ib_core marvell hibmc_drm drm_vram_helper ttm drm_kms_helper crct10dif_ce ghash_ce syscopyarea sysfillrect sha2_ce mlx5_core sysimgblt sha256_arm64 ixgbe hisi_sas_v2_hw fb_sys_fops nvme sha1_ce hisi_sas_main tls xfrm_algo drm megaraid_sas nvme_core mdio mlxfw libsas ehci_platform scsi_transport_sas hns_dsaf hns_enet_drv hns_mdio hnae aes_neon_bs aes_neon_blk aes_ce_blk crypto_simd cryptd aes_ce_cipher Dec 2 12:17:12 appleton-kernel kernel: [ 64.288629] CPU: 16 PID: 0 Comm: swapper/16 Not tainted 5.4.0-91-generic #102~18.04.1-Ubuntu Dec 2 12:17:12 appleton-kernel kernel: [ 64.288630] Hardware name: Hisilicon D05/BC11SPCD, BIOS 1.50 06/01/2018 Dec 2 12:17:12 appleton-kernel kernel: [ 64.288632] pstate: 40400005 (nZcv daif +PAN -UAO) Dec 2 12:17:12 appleton-kernel kernel: [ 64.288640] pc : __do_softirq+0x98/0x350 Dec 2 12:17:12 appleton-kernel kernel: [ 64.288644] lr : irq_exit+0xc0/0xc8 Dec 2 12:17:12 appleton-kernel kernel: [ 64.288645] sp : ffff800011ee3ef0 Dec 2 12:17:12 appleton-kernel kernel: [ 64.288646] x29: ffff800011ee3ef0 x28: ffff002fb71a2d00 Dec 2 12:17:12 appleton-kernel kernel: [ 64.288649] x27: 0000000000000000 x26: ffff800011ee4000 Dec 2 12:17:12 appleton-kernel kernel: [ 64.288650] x25: ffff800011ee0000 x24: ffff001fba073600 Dec 2 12:17:12 appleton-kernel kernel: [ 64.288652] x23: ffff80001234bdb0 x22: 0000000000000000 Dec 2 12:17:12 appleton-kernel kernel: [ 64.288654] x21: 0000000000000282 x20: 0000000000000002 Dec 2 12:17:12 appleton-kernel kernel: [ 64.288656] x19: ffff8000116b3000 x18: ffff800011267510 Dec 2 12:17:12 appleton-kernel kernel: [ 64.288658] x17: 0000000000000000 x16: 0000000000000000 Dec 2 12:17:12 appleton-kernel kernel: [ 64.288659] x15: 0000000000000001 x14: ffff002fbb9f21c8 Dec 2 12:17:12 appleton-kernel kernel: [ 64.288661] x13: 0000000000000004 x12: 0000000000000002 Dec 2 12:17:12 appleton-kernel kernel: [ 64.288663] x11: 0000000000000000 x10: 0000000000000040 Dec 2 12:17:12 appleton-kernel kernel: [ 64.288665] x9 : ffff800011bbf228 x8 : ffff800011bbf220 Dec 2 12:17:12 appleton-kernel kernel: [ 64.288666] x7 : ffff001fb9002270 x6 : 00000002c07fa07f Dec 2 12:17:12 appleton-kernel kernel: [ 64.288668] x5 : 00000000ffff00c1 x4 : ffff802faa352000 Dec 2 12:17:12 appleton-kernel kernel: [ 64.288670] x3 : ffff8000116b3780 x2 : ffff802faa352000 Dec 2 12:17:12 appleton-kernel kernel: [ 64.288672] x1 : 00000000000000e0 x0 : ffff8000116b3780 Dec 2 12:17:12 appleton-kernel kernel: [ 64.288675] Call trace: Dec 2 12:17:12 appleton-kernel kernel: [ 64.288677] __do_softirq+0x98/0x350 Dec 2 12:17:12 appleton-kernel kernel: [ 64.288679] irq_exit+0xc0/0xc8 Dec 2 12:17:12 appleton-kernel kernel: [ 64.288683] __handle_domain_irq+0x6c/0xc0 Dec 2 12:17:12 appleton-kernel kernel: [ 64.288685] gic_handle_irq+0x84/0x2c0 Dec 2 12:17:12 appleton-kernel kernel: [ 64.288687] el1_irq+0x104/0x1c0 Dec 2 12:17:12 appleton-kernel kernel: [ 64.288690] arch_cpu_idle+0x34/0x1c0 Dec 2 12:17:12 appleton-kernel kernel: [ 64.288694] default_idle_call+0x24/0x60 Dec 2 12:17:12 appleton-kernel kernel: [ 64.288696] do_idle+0x1d8/0x2b8 Dec 2 12:17:12 appleton-kernel kernel: [ 64.288699] cpu_startup_entry+0x28/0xb0 Dec 2 12:17:12 appleton-kernel kernel: [ 64.288702] secondary_start_kernel+0x198/0x288 Dec 2 12:17:46 appleton-kernel kernel: [ 98.829315] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: Dec 2 12:17:46 appleton-kernel kernel: [ 98.835229] rcu: 16-....: (17 GPs behind) idle=3b2/0/0x1 softirq=877/877 fqs=7317 Dec 2 12:17:46 appleton-kernel kernel: [ 98.842875] (detected by 35, t=15005 jiffies, g=3737, q=2560) Dec 2 12:17:46 appleton-kernel kernel: [ 98.842877] Task dump for CPU 16: Dec 2 12:17:46 appleton-kernel kernel: [ 98.842880] swapper/16 R running task 0 0 1 0x0000002a Dec 2 12:17:46 appleton-kernel kernel: [ 98.842885] Call trace: Dec 2 12:17:46 appleton-kernel kernel: [ 98.842897] __switch_to+0x108/0x248 Dec 2 12:17:46 appleton-kernel kernel: [ 98.842902] 0x0 Dec 2 12:20:46 appleton-kernel kernel: [ 278.848845] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: Dec 2 12:20:46 appleton-kernel kernel: [ 278.854762] rcu: 16-....: (17 GPs behind) idle=3b2/0/0x1 softirq=877/877 fqs=29680 Dec 2 12:20:46 appleton-kernel kernel: [ 278.862495] (detected by 48, t=60007 jiffies, g=3737, q=14867) Dec 2 12:20:46 appleton-kernel kernel: [ 278.862498] Task dump for CPU 16: Dec 2 12:20:46 appleton-kernel kernel: [ 278.862500] swapper/16 R running task 0 0 1 0x0000002a Dec 2 12:20:46 appleton-kernel kernel: [ 278.862506] Call trace: Dec 2 12:20:46 appleton-kernel kernel: [ 278.862520] __switch_to+0x108/0x248 Dec 2 12:20:46 appleton-kernel kernel: [ 278.862526] 0x0 This is not so easily reproducible with the focal:linux kernel though, however at one time I was able to get this stack trace where it hung on some mlx5_core code: [ 356.542512] watchdog: BUG: soft lockup - CPU#23 stuck for 23s! [swapper/23:0] [ 356.549633] Modules linked in: nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ipmi_ssif input_leds joydev ipmi_si ipmi_devintf ipmi_msghandler sch_fq_codel ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 multipath linear mlx5_ib ses enclosure hid_generic usbhid hid ib_uverbs marvell ib_core hibmc_drm drm_vram_helper ttm drm_kms_helper crct10dif_ce syscopyarea sysfillrect ghash_ce sha2_ce sysimgblt mlx5_core fb_sys_fops hisi_sas_v2_hw sha256_arm64 ixgbe sha1_ce nvme hisi_sas_main tls xfrm_algo drm mdio libsas megaraid_sas nvme_core mlxfw ehci_platform hns_dsaf scsi_transport_sas hns_enet_drv hns_mdio hnae aes_neon_bs aes_neon_blk aes_ce_blk crypto_simd cryptd aes_ce_cipher [ 356.620860] CPU: 23 PID: 0 Comm: swapper/23 Not tainted 5.4.0-91-generic #102-Ubuntu [ 356.628588] Hardware name: Hisilicon D05/BC11SPCD, BIOS 1.50 06/01/2018 [ 356.635188] pstate: 60400005 (nZCv daif +PAN -UAO) [ 356.640007] pc : mlx5e_poll_rx_cq+0x30/0x8a0 [mlx5_core] [ 356.645338] lr : mlx5e_napi_poll+0x100/0x660 [mlx5_core] [ 356.650635] sp : ffff800011efbcc0 [ 356.653935] x29: ffff800011efbcc0 x28: 0000000000000001 [ 356.659233] x27: ffff009fa0ed8000 x26: ffff009fa0ed8000 [ 356.664531] x25: ffff009fa0eda480 x24: ffff009fa0ed7ec8 [ 356.669829] x23: 0000000000000040 x22: 0000000000000000 [ 356.675126] x21: ffff002fbbaeddc0 x20: 0000000000000040 [ 356.680424] x19: ffff009fa0ed8100 x18: 0000000000000010 [ 356.685721] x17: 0000000000000000 x16: 0000000000000000 [ 356.691019] x15: ffff002fb71adf28 x14: 6220726f6620746e [ 356.696316] x13: ffffffffffffc138 x12: ffffffffffffe238 [ 356.701614] x11: 0000000000003138 x10: 3935313a746e695f [ 356.706911] x9 : ffff800011db5000 x8 : 00000000000006d9 [ 356.712209] x7 : 0000000000000017 x6 : ffff002f8d1c2200 [ 356.717506] x5 : 000000000000000a x4 : 0000000000000006 [ 356.722804] x3 : 0000000000000000 x2 : 0000000000000040 [ 356.728101] x1 : 0000000000000040 x0 : ffff8000094f70a8 [ 356.733400] Call trace: [ 356.735867] mlx5e_poll_rx_cq+0x30/0x8a0 [mlx5_core] [ 356.740850] mlx5e_napi_poll+0x100/0x660 [mlx5_core] [ 356.745805] net_rx_action+0x180/0x488 [ 356.749543] __do_softirq+0x130/0x34c [ 356.753194] irq_exit+0xa4/0xc8 [ 356.756323] __handle_domain_irq+0x74/0xc8 [ 356.760406] gic_handle_irq+0x10c/0x2cc [ 356.764228] el1_irq+0x104/0x1c0 [ 356.767443] arch_cpu_idle+0x3c/0x1c8 [ 356.771093] default_idle_call+0x24/0x60 [ 356.775003] do_idle+0x214/0x298 [ 356.778218] cpu_startup_entry+0x30/0xb8 [ 356.782128] secondary_start_kernel+0x16c/0x1e0 [ 390.270862] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: [ 390.276771] rcu: 23-....: (364 GPs behind) idle=966/0/0x3 softirq=1461/1461 fqs=7130 [ 390.284676] (detected by 10, t=15005 jiffies, g=15605, q=284) [ 390.290499] Call trace: [ 390.292935] __switch_to+0x134/0x190 [ 390.296498] 0xffff002fb71ada00 [ 392.554895] rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 23-... } 15478 jiffies s: 645 root: 0x2/. [ 392.565495] rcu: blocking rcu_node structures: l=1:16-31:0x80/. [ 392.571412] Call trace: [ 392.573848] __switch_to+0x134/0x190 [ 392.577413] 0xffff002fb71ada00 [ 424.543333] watchdog: BUG: soft lockup - CPU#23 stuck for 22s! [swapper/23:0] [ 424.550454] Modules linked in: nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ipmi_ssif input_leds joydev ipmi_si ipmi_devintf ipmi_msghandler sch_fq_codel ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 multipath linear mlx5_ib ses enclosure hid_generic usbhid hid ib_uverbs marvell ib_core hibmc_drm drm_vram_helper ttm drm_kms_helper crct10dif_ce syscopyarea sysfillrect ghash_ce sha2_ce sysimgblt mlx5_core fb_sys_fops hisi_sas_v2_hw sha256_arm64 ixgbe sha1_ce nvme hisi_sas_main tls xfrm_algo drm mdio libsas megaraid_sas nvme_core mlxfw ehci_platform hns_dsaf scsi_transport_sas hns_enet_drv hns_mdio hnae aes_neon_bs aes_neon_blk aes_ce_blk crypto_simd cryptd aes_ce_cipher [ 424.621659] CPU: 23 PID: 0 Comm: swapper/23 Tainted: G L 5.4.0-91-generic #102-Ubuntu [ 424.630775] Hardware name: Hisilicon D05/BC11SPCD, BIOS 1.50 06/01/2018 [ 424.637374] pstate: 20400005 (nzCv daif +PAN -UAO) [ 424.642152] pc : tasklet_action_common.isra.0+0x5c/0x1a0 [ 424.647450] lr : tasklet_action+0x30/0x38 [ 424.651444] sp : ffff800011efbe50 [ 424.654745] x29: ffff800011efbe50 x28: ffff002fbbad8fc0 [ 424.660043] x27: 0000000000000003 x26: ffff002fb71ada00 [ 424.665340] x25: 0000000000000100 x24: 0000000000000006 [ 424.670638] x23: 0000000000000060 x22: 00000000000000e0 [ 424.675935] x21: 0000000000000004 x20: ffff002fa94150a0 [ 424.681233] x19: ffff800011b750f0 x18: 0000000000000010 [ 424.686530] x17: 0000000000000000 x16: 0000000000000000 [ 424.691828] x15: ffff002fb71adf28 x14: 6220726f6620746e [ 424.697125] x13: ffffffffffffc138 x12: ffffffffffffe238 [ 424.702423] x11: 0000000000003138 x10: 3935313a746e695f [ 424.707720] x9 : 0000000000003c80 x8 : 00000000000006d9 [ 424.713018] x7 : 0000000000000000 x6 : 00000006e8cc2da3 [ 424.718315] x5 : 00ffffffffffffff x4 : ffff002fad862680 [ 424.723613] x3 : 00000000000004af x2 : ffff802faa45b000 [ 424.728910] x1 : 0000000000000000 x0 : ffff002fb71adf28 [ 424.734208] Call trace: [ 424.736642] tasklet_action_common.isra.0+0x5c/0x1a0 [ 424.741593] tasklet_action+0x30/0x38 [ 424.745241] __do_softirq+0x130/0x34c [ 424.748890] irq_exit+0xa4/0xc8 [ 424.752018] __handle_domain_irq+0x74/0xc8 [ 424.756100] gic_handle_irq+0x10c/0x2cc [ 424.759922] el1_irq+0x104/0x1c0 [ 424.763137] arch_cpu_idle+0x3c/0x1c8 [ 424.766785] default_idle_call+0x24/0x60 [ 424.770694] do_idle+0x214/0x298 [ 424.773909] cpu_startup_entry+0x30/0xb8 [ 424.777818] secondary_start_kernel+0x16c/0x1e0 [ 484.716312] INFO: task shutdown:1 blocked for more than 120 seconds. [ 484.722658] Tainted: G L 5.4.0-91-generic #102-Ubuntu [ 484.729521] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 484.737341] Call trace: [ 484.739776] __switch_to+0x134/0x190 [ 484.743341] __schedule+0x31c/0x7e0 [ 484.746818] schedule+0x40/0xb8 [ 484.749950] synchronize_rcu_expedited+0x1e0/0x3a8 [ 484.754730] synchronize_net+0x24/0x38 [ 484.758469] netif_napi_del+0xac/0xb0 [ 484.762155] mlx5e_close_channel+0x38/0x58 [mlx5_core] [ 484.767315] mlx5e_close_channels+0x38/0x60 [mlx5_core] [ 484.772562] mlx5e_close_locked+0x70/0x90 [mlx5_core] [ 484.777634] mlx5e_close+0x58/0x80 [mlx5_core] [ 484.782101] mlx5e_nic_disable+0x9c/0xc0 [mlx5_core] [ 484.787087] mlx5e_detach_netdev+0x54/0x90 [mlx5_core] [ 484.792246] mlx5e_detach+0x60/0x78 [mlx5_core] [ 484.796796] mlx5_detach_device+0xb4/0x138 [mlx5_core] [ 484.801954] mlx5_unload_one+0xc4/0x170 [mlx5_core] [ 484.806852] shutdown+0x1c4/0x1e0 [mlx5_core] [ 484.811202] pci_device_shutdown+0x48/0x88 [ 484.815288] device_shutdown+0x14c/0x298 [ 484.819201] kernel_power_off+0x40/0x78 [ 484.823026] __do_sys_reboot+0x158/0x228 [ 484.826937] __arm64_sys_reboot+0x30/0x40 [ 484.830936] el0_svc_common.constprop.0+0xf4/0x200 [ 484.835715] el0_svc_handler+0x38/0xa8 [ 484.839454] el0_svc+0x10/0x2c8 Not sure if the issue with mlx5 is the symptom or the root cause. This is not a regression as I was able to reproduce it with hwe version 5.4.0-89-generic. I have not tried older kernels but it's possible that this issue is present for a long time. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/1953058/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp