I can see this issue with 5.4.0-124-generic #140~18.04.1-Ubuntu on node appleton-kernel as well.
After this, it's cpu soft lockup: [ 19.296854] mlx5_core 0005:01:00.0: mlx5_eq_comp_int:159:(pid 0): Completion event for bogus CQ 0x5a5aa9 [ 19.296855] mlx5_core 0005:01:00.0: mlx5_eq_comp_int:159:(pid 0): Completion event for bogus CQ 0x5a5aa9 [ 19.296858] mlx5_core 0005:01:00.0: mlx5_eq_comp_int:159:(pid 0): Completion event for bogus CQ 0x5a5aa9 [ 19.296860] mlx5_core 0005:01:00.0: mlx5_eq_comp_int:159:(pid 0): Completion event for bogus CQ 0x5a5aa9 [ 19.347370] mlx5_core 0005:01:00.0 enP5p1s0f0: Link down [ 19.634790] ixgbe 000a:11:00.0: registered PHC device on enP10p17s0f0 [ 21.492952] hns-nic HISI00C2:00 enahisic2i0: link up [ 21.492971] IPv6: ADDRCONF(NETDEV_CHANGE): enahisic2i0: link becomes ready [ 25.794327] EXT4-fs (nvme0n1p2): resizing filesystem from 390571008 to 390572113 blocks [ 25.794567] EXT4-fs (nvme0n1p2): resized filesystem to 390572113 [ 27.550919] new mount options do not match the existing superblock, will be ignored [ 32.692121] fbcon: Taking over console [ 32.698403] Console: switching to colour frame buffer device 100x37 [ 64.276773] watchdog: BUG: soft lockup - CPU#16 stuck for 22s! [swapper/16:0] [ 64.283899] Modules linked in: nls_iso8859_1 ipmi_ssif input_leds joydev ipmi_si ipmi_devintf ipmi_msghandler sch_fq_codel ib_iser rdma_cm iw_cm ib_cm iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 multipath linear mlx5_ib hibmc_drm drm_vram_helper ses enclosure ttm hid_generic usbhid ib_uverbs hid ib_core marvell drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops crct10dif_ce mlx5_core hisi_sas_v2_hw ghash_ce sha2_ce sha256_arm64 ixgbe sha1_ce tls hisi_sas_main nvme xfrm_algo drm megaraid_sas nvme_core mdio mlxfw libsas ehci_platform scsi_transport_sas hns_dsaf hns_enet_drv hns_mdio hnae aes_neon_bs aes_neon_blk aes_ce_blk crypto_simd cryptd aes_ce_cipher [ 64.283952] CPU: 16 PID: 0 Comm: swapper/16 Not tainted 5.4.0-124-generic #140~18.04.1-Ubuntu [ 64.283954] Hardware name: Hisilicon D05/BC11SPCD, BIOS 1.50 06/01/2018 [ 64.283956] pstate: 40400005 (nZcv daif +PAN -UAO) [ 64.283962] pc : __do_softirq+0x98/0x350 [ 64.283966] lr : irq_exit+0xc0/0xc8 [ 64.283967] sp : ffff8000123b3ef0 [ 64.283969] x29: ffff8000123b3ef0 x28: ffff002fb7193d00 [ 64.283971] x27: 0000000000000000 x26: ffff8000123b4000 [ 64.283972] x25: ffff8000123b0000 x24: ffff001fba073600 [ 64.283974] x23: ffff8000127cbdb0 x22: 0000000000000000 [ 64.283976] x21: 0000000000000282 x20: 0000000000000002 [ 64.283977] x19: ffff800011b84000 x18: ffff800011268830 [ 64.283979] x17: 0000000000000000 x16: 0000000000000000 [ 64.283980] x15: 0000000000000001 x14: ffff002fbb9f21c8 [ 64.283982] x13: 0000000000000004 x12: 0000000000000003 [ 64.283984] x11: 0000000000000000 x10: 0000000000000040 [ 64.283985] x9 : ffff80001208f358 x8 : ffff80001208f350 [ 64.283987] x7 : ffff001fb9002270 x6 : 00000002a698ef5f [ 64.283989] x5 : 00000000ffff0031 x4 : ffff802fa9e81000 [ 64.283991] x3 : ffff800011b84780 x2 : ffff802fa9e81000 [ 64.283993] x1 : 00000000000000e0 x0 : ffff800011b84780 [ 64.283995] Call trace: [ 64.283998] __do_softirq+0x98/0x350 [ 64.284000] irq_exit+0xc0/0xc8 [ 64.284003] __handle_domain_irq+0x6c/0xc0 [ 64.284005] gic_handle_irq+0x84/0x2c0 [ 64.284007] el1_irq+0x104/0x1c0 [ 64.284010] arch_cpu_idle+0x34/0x1c0 [ 64.284014] default_idle_call+0x24/0x60 [ 64.284016] do_idle+0x1d8/0x2b8 [ 64.284017] cpu_startup_entry+0x2c/0xb0 [ 64.284020] secondary_start_kernel+0x198/0x288 [ 98.196663] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: [ 98.202575] rcu: 16-....: (3 GPs behind) idle=8fa/0/0x3 softirq=983/983 fqs=7488 [ 98.210133] (detected by 5, t=15002 jiffies, g=4709, q=3243) [ 98.210134] Task dump for CPU 16: [ 98.210137] swapper/16 R running task 0 0 1 0x0000002a [ 98.210140] Call trace: [ 98.210146] __switch_to+0xcc/0x210 [ 98.210149] 0x0 [ 119.928660] rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 16-... } 15393 jiffies s: 229 root: 0x2/. [ 119.939266] rcu: blocking rcu_node structures: l=1:16-31:0x1/. [ 119.945099] Task dump for CPU 16: [ 119.945102] swapper/16 R running task 0 0 1 0x0000002a [ 119.945108] Call trace: [ 119.945120] __switch_to+0xcc/0x210 [ 119.945127] 0x0 [ 242.808432] INFO: task ureadahead:1097 blocked for more than 120 seconds. [ 242.815214] Tainted: G L 5.4.0-124-generic #140~18.04.1-Ubuntu [ 242.822868] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 242.830691] ureadahead D 0 1097 1 0x00000000 [ 242.830695] Call trace: [ 242.830703] __switch_to+0xcc/0x210 [ 242.830710] __schedule+0x310/0x7a8 [ 242.830712] schedule+0x38/0xa8 [ 242.830714] schedule_timeout+0x228/0x388 [ 242.830716] wait_for_completion+0xf4/0x4b8 [ 242.830719] __wait_rcu_gp+0x170/0x1a8 [ 242.830722] synchronize_rcu+0x68/0x98 [ 242.830725] ring_buffer_read_prepare_sync+0xc/0x18 [ 242.830727] __tracing_open+0x200/0x368 [ 242.830729] tracing_open+0xa4/0xf0 [ 242.830733] do_dentry_open+0x1cc/0x3e0 [ 242.830735] vfs_open+0x38/0x48 [ 242.830738] path_openat+0x2ac/0x1368 [ 242.830740] do_filp_open+0x88/0x108 [ 242.830742] do_sys_open+0x1b4/0x2e8 [ 242.830743] __arm64_sys_openat+0x2c/0x38 [ 242.830746] el0_svc_common.constprop.3+0x80/0x1f8 [ 242.830748] el0_svc_handler+0x34/0xa0 [ 242.830750] el0_svc+0x10/0x180 ** Tags added: sru-20220808 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1958952 Title: ARM64 node dmesg spammed with "mlx5_core 0005:01:00.0: mlx5_eq_comp_int:159:(pid 1180): Completion event for bogus CQ 0x5a5aa9" Status in ubuntu-kernel-tests: New Status in linux package in Ubuntu: Confirmed Bug description: While investigating the SRU deployment failure, I noticed the dmesg will be spammed with: Jan 25 07:48:36 appleton-kernel kernel: [ 22.885627] mlx5_core 0005:01:00.0: mlx5_eq_comp_int:159:(pid 1180): Completion event for bogus CQ 0x5a5aa9 Jan 25 07:48:36 appleton-kernel kernel: [ 22.885628] mlx5_core 0005:01:00.0: mlx5_eq_comp_int:159:(pid 1218): Completion event for bogus CQ 0x5a5aa9 Jan 25 07:48:36 appleton-kernel kernel: [ 22.885629] mlx5_core 0005:01:00.0: mlx5_eq_comp_int:159:(pid 1180): Completion event for bogus CQ 0x5a5aa9 Jan 25 07:48:36 appleton-kernel kernel: [ 22.885631] mlx5_core 0005:01:00.0: mlx5_eq_comp_int:159:(pid 1180): Completion event for bogus CQ 0x5a5aa9 Issue found with Focal 5.4.0-96-generic Please find attachment for the syslog. Not sure if this is cause of our deployment issue, but it seems odd to me. And here is our deployment issue: 1. System successfully deployed with Focal 2. Deployment process hangs with "Enabling PPA" stage 3. I cannot connect to this system manually, ssh hangs (soft lockup maybe?) after: Warning: Permanently added '10.229.50.13' (ECDSA) to the list of known hosts. ProblemType: Bug DistroRelease: Ubuntu 20.04 Package: linux-image-5.4.0-96-generic 5.4.0-96.109 ProcVersionSignature: Ubuntu 5.4.0-96.109-generic 5.4.157 Uname: Linux 5.4.0-96-generic aarch64 AlsaDevices: total 0 crw-rw---- 1 root audio 116, 1 Jan 25 07:48 seq crw-rw---- 1 root audio 116, 33 Jan 25 07:48 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay' ApportVersion: 2.20.11-0ubuntu27.21 Architecture: arm64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: CasperMD5CheckResult: skip Date: Tue Jan 25 07:53:33 2022 IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig' Lsusb: Bus 001 Device 004: ID 12d1:0003 Huawei Technologies Co., Ltd. Bus 001 Device 003: ID 0424:2514 Microchip Technology, Inc. (formerly SMSC) USB 2.0 Hub Bus 001 Device 002: ID 0424:2514 Microchip Technology, Inc. (formerly SMSC) USB 2.0 Hub Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Lsusb-t: /: Bus 01.Port 1: Dev 1, Class=root_hub, Driver=ehci-platform/2p, 480M |__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/4p, 480M |__ Port 2: Dev 3, If 0, Class=Hub, Driver=hub/4p, 480M |__ Port 1: Dev 4, If 1, Class=Human Interface Device, Driver=usbhid, 12M |__ Port 1: Dev 4, If 0, Class=Human Interface Device, Driver=usbhid, 12M MachineType: Hisilicon D05 PciMultimedia: ProcFB: 0 hibmcdrmfb ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.4.0-96-generic root=UUID=3abb8e5a-2f46-4221-b664-cb02a273a249 ro sysrq_always_enabled RelatedPackageVersions: linux-restricted-modules-5.4.0-96-generic N/A linux-backports-modules-5.4.0-96-generic N/A linux-firmware 1.187.25 RfKill: Error: [Errno 2] No such file or directory: 'rfkill' SourcePackage: linux UpgradeStatus: No upgrade log present (probably fresh install) dmi.bios.date: 06/01/2018 dmi.bios.vendor: Huawei dmi.bios.version: 1.50 dmi.board.asset.tag: To be filled by O.E.M. dmi.board.name: BC11SPCD dmi.board.vendor: Huawei dmi.board.version: VER.A dmi.chassis.asset.tag: To be filled by O.E.M. dmi.chassis.type: 17 dmi.chassis.vendor: Hisilicon dmi.chassis.version: To be filled by O.E.M. dmi.modalias: dmi:bvnHuawei:bvr1.50:bd06/01/2018:svnHisilicon:pnD05:pvrV100R001C00:rvnHuawei:rnBC11SPCD:rvrVER.A:cvnHisilicon:ct17:cvrTobefilledbyO.E.M.: dmi.product.family: To be filled by O.E.M. dmi.product.name: D05 dmi.product.sku: To be filled by O.E.M. dmi.product.version: V100R001C00 dmi.sys.vendor: Hisilicon To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/1958952/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp