I can see this issue with 5.4.0-124-generic #140~18.04.1-Ubuntu on node
appleton-kernel as well.

After this, it's cpu soft lockup:
[   19.296854] mlx5_core 0005:01:00.0: mlx5_eq_comp_int:159:(pid 0): Completion 
event for bogus CQ 0x5a5aa9
[   19.296855] mlx5_core 0005:01:00.0: mlx5_eq_comp_int:159:(pid 0): Completion 
event for bogus CQ 0x5a5aa9
[   19.296858] mlx5_core 0005:01:00.0: mlx5_eq_comp_int:159:(pid 0): Completion 
event for bogus CQ 0x5a5aa9
[   19.296860] mlx5_core 0005:01:00.0: mlx5_eq_comp_int:159:(pid 0): Completion 
event for bogus CQ 0x5a5aa9
[   19.347370] mlx5_core 0005:01:00.0 enP5p1s0f0: Link down
[   19.634790] ixgbe 000a:11:00.0: registered PHC device on enP10p17s0f0
[   21.492952] hns-nic HISI00C2:00 enahisic2i0: link up
[   21.492971] IPv6: ADDRCONF(NETDEV_CHANGE): enahisic2i0: link becomes ready
[   25.794327] EXT4-fs (nvme0n1p2): resizing filesystem from 390571008 to 
390572113 blocks
[   25.794567] EXT4-fs (nvme0n1p2): resized filesystem to 390572113
[   27.550919] new mount options do not match the existing superblock, will be 
ignored
[   32.692121] fbcon: Taking over console
[   32.698403] Console: switching to colour frame buffer device 100x37
[   64.276773] watchdog: BUG: soft lockup - CPU#16 stuck for 22s! [swapper/16:0]
[   64.283899] Modules linked in: nls_iso8859_1 ipmi_ssif input_leds joydev 
ipmi_si ipmi_devintf ipmi_msghandler sch_fq_codel ib_iser rdma_cm iw_cm ib_cm 
iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 
btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq 
async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 multipath linear 
mlx5_ib hibmc_drm drm_vram_helper ses enclosure ttm hid_generic usbhid 
ib_uverbs hid ib_core marvell drm_kms_helper syscopyarea sysfillrect sysimgblt 
fb_sys_fops crct10dif_ce mlx5_core hisi_sas_v2_hw ghash_ce sha2_ce sha256_arm64 
ixgbe sha1_ce tls hisi_sas_main nvme xfrm_algo drm megaraid_sas nvme_core mdio 
mlxfw libsas ehci_platform scsi_transport_sas hns_dsaf hns_enet_drv hns_mdio 
hnae aes_neon_bs aes_neon_blk aes_ce_blk crypto_simd cryptd aes_ce_cipher
[   64.283952] CPU: 16 PID: 0 Comm: swapper/16 Not tainted 5.4.0-124-generic 
#140~18.04.1-Ubuntu
[   64.283954] Hardware name: Hisilicon D05/BC11SPCD, BIOS 1.50 06/01/2018
[   64.283956] pstate: 40400005 (nZcv daif +PAN -UAO)
[   64.283962] pc : __do_softirq+0x98/0x350
[   64.283966] lr : irq_exit+0xc0/0xc8
[   64.283967] sp : ffff8000123b3ef0
[   64.283969] x29: ffff8000123b3ef0 x28: ffff002fb7193d00 
[   64.283971] x27: 0000000000000000 x26: ffff8000123b4000 
[   64.283972] x25: ffff8000123b0000 x24: ffff001fba073600 
[   64.283974] x23: ffff8000127cbdb0 x22: 0000000000000000 
[   64.283976] x21: 0000000000000282 x20: 0000000000000002 
[   64.283977] x19: ffff800011b84000 x18: ffff800011268830 
[   64.283979] x17: 0000000000000000 x16: 0000000000000000 
[   64.283980] x15: 0000000000000001 x14: ffff002fbb9f21c8 
[   64.283982] x13: 0000000000000004 x12: 0000000000000003 
[   64.283984] x11: 0000000000000000 x10: 0000000000000040 
[   64.283985] x9 : ffff80001208f358 x8 : ffff80001208f350 
[   64.283987] x7 : ffff001fb9002270 x6 : 00000002a698ef5f 
[   64.283989] x5 : 00000000ffff0031 x4 : ffff802fa9e81000 
[   64.283991] x3 : ffff800011b84780 x2 : ffff802fa9e81000 
[   64.283993] x1 : 00000000000000e0 x0 : ffff800011b84780 
[   64.283995] Call trace:
[   64.283998]  __do_softirq+0x98/0x350
[   64.284000]  irq_exit+0xc0/0xc8
[   64.284003]  __handle_domain_irq+0x6c/0xc0
[   64.284005]  gic_handle_irq+0x84/0x2c0
[   64.284007]  el1_irq+0x104/0x1c0
[   64.284010]  arch_cpu_idle+0x34/0x1c0
[   64.284014]  default_idle_call+0x24/0x60
[   64.284016]  do_idle+0x1d8/0x2b8
[   64.284017]  cpu_startup_entry+0x2c/0xb0
[   64.284020]  secondary_start_kernel+0x198/0x288
[   98.196663] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[   98.202575] rcu:     16-....: (3 GPs behind) idle=8fa/0/0x3 softirq=983/983 
fqs=7488 
[   98.210133]  (detected by 5, t=15002 jiffies, g=4709, q=3243)
[   98.210134] Task dump for CPU 16:
[   98.210137] swapper/16      R  running task        0     0      1 0x0000002a
[   98.210140] Call trace:
[   98.210146]  __switch_to+0xcc/0x210
[   98.210149]  0x0
[  119.928660] rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 
16-... } 15393 jiffies s: 229 root: 0x2/.
[  119.939266] rcu: blocking rcu_node structures: l=1:16-31:0x1/.
[  119.945099] Task dump for CPU 16:
[  119.945102] swapper/16      R  running task        0     0      1 0x0000002a
[  119.945108] Call trace:
[  119.945120]  __switch_to+0xcc/0x210
[  119.945127]  0x0
[  242.808432] INFO: task ureadahead:1097 blocked for more than 120 seconds.
[  242.815214]       Tainted: G             L    5.4.0-124-generic 
#140~18.04.1-Ubuntu
[  242.822868] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[  242.830691] ureadahead      D    0  1097      1 0x00000000
[  242.830695] Call trace:
[  242.830703]  __switch_to+0xcc/0x210
[  242.830710]  __schedule+0x310/0x7a8
[  242.830712]  schedule+0x38/0xa8
[  242.830714]  schedule_timeout+0x228/0x388
[  242.830716]  wait_for_completion+0xf4/0x4b8
[  242.830719]  __wait_rcu_gp+0x170/0x1a8
[  242.830722]  synchronize_rcu+0x68/0x98
[  242.830725]  ring_buffer_read_prepare_sync+0xc/0x18
[  242.830727]  __tracing_open+0x200/0x368
[  242.830729]  tracing_open+0xa4/0xf0
[  242.830733]  do_dentry_open+0x1cc/0x3e0
[  242.830735]  vfs_open+0x38/0x48
[  242.830738]  path_openat+0x2ac/0x1368
[  242.830740]  do_filp_open+0x88/0x108
[  242.830742]  do_sys_open+0x1b4/0x2e8
[  242.830743]  __arm64_sys_openat+0x2c/0x38
[  242.830746]  el0_svc_common.constprop.3+0x80/0x1f8
[  242.830748]  el0_svc_handler+0x34/0xa0
[  242.830750]  el0_svc+0x10/0x180


** Tags added: sru-20220808

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1958952

Title:
  ARM64 node dmesg spammed with "mlx5_core 0005:01:00.0:
  mlx5_eq_comp_int:159:(pid 1180): Completion event for bogus CQ
  0x5a5aa9"

Status in ubuntu-kernel-tests:
  New
Status in linux package in Ubuntu:
  Confirmed

Bug description:
  While investigating the SRU deployment failure, I noticed the dmesg
  will be spammed with:

  Jan 25 07:48:36 appleton-kernel kernel: [   22.885627] mlx5_core 
0005:01:00.0: mlx5_eq_comp_int:159:(pid 1180): Completion event for bogus CQ 
0x5a5aa9
  Jan 25 07:48:36 appleton-kernel kernel: [   22.885628] mlx5_core 
0005:01:00.0: mlx5_eq_comp_int:159:(pid 1218): Completion event for bogus CQ 
0x5a5aa9
  Jan 25 07:48:36 appleton-kernel kernel: [   22.885629] mlx5_core 
0005:01:00.0: mlx5_eq_comp_int:159:(pid 1180): Completion event for bogus CQ 
0x5a5aa9
  Jan 25 07:48:36 appleton-kernel kernel: [   22.885631] mlx5_core 
0005:01:00.0: mlx5_eq_comp_int:159:(pid 1180): Completion event for bogus CQ 
0x5a5aa9

  Issue found with Focal 5.4.0-96-generic

  Please find attachment for the syslog.

  Not sure if this is cause of our deployment issue, but it seems odd to me.
  And here is our deployment issue:
    1. System successfully deployed with Focal
    2. Deployment process hangs with "Enabling PPA" stage
    3. I cannot connect to this system manually, ssh hangs (soft lockup maybe?) 
after:
          Warning: Permanently added '10.229.50.13' (ECDSA) to the list of 
known hosts.

  ProblemType: Bug
  DistroRelease: Ubuntu 20.04
  Package: linux-image-5.4.0-96-generic 5.4.0-96.109
  ProcVersionSignature: Ubuntu 5.4.0-96.109-generic 5.4.157
  Uname: Linux 5.4.0-96-generic aarch64
  AlsaDevices:
   total 0
   crw-rw---- 1 root audio 116,  1 Jan 25 07:48 seq
   crw-rw---- 1 root audio 116, 33 Jan 25 07:48 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.20.11-0ubuntu27.21
  Architecture: arm64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  CasperMD5CheckResult: skip
  Date: Tue Jan 25 07:53:33 2022
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
  Lsusb:
   Bus 001 Device 004: ID 12d1:0003 Huawei Technologies Co., Ltd.
   Bus 001 Device 003: ID 0424:2514 Microchip Technology, Inc. (formerly SMSC) 
USB 2.0 Hub
   Bus 001 Device 002: ID 0424:2514 Microchip Technology, Inc. (formerly SMSC) 
USB 2.0 Hub
   Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
  Lsusb-t:
   /:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=ehci-platform/2p, 480M
       |__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/4p, 480M
       |__ Port 2: Dev 3, If 0, Class=Hub, Driver=hub/4p, 480M
           |__ Port 1: Dev 4, If 1, Class=Human Interface Device, 
Driver=usbhid, 12M
           |__ Port 1: Dev 4, If 0, Class=Human Interface Device, 
Driver=usbhid, 12M
  MachineType: Hisilicon D05
  PciMultimedia:

  ProcFB: 0 hibmcdrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.4.0-96-generic 
root=UUID=3abb8e5a-2f46-4221-b664-cb02a273a249 ro sysrq_always_enabled
  RelatedPackageVersions:
   linux-restricted-modules-5.4.0-96-generic N/A
   linux-backports-modules-5.4.0-96-generic  N/A
   linux-firmware                            1.187.25
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
  SourcePackage: linux
  UpgradeStatus: No upgrade log present (probably fresh install)
  dmi.bios.date: 06/01/2018
  dmi.bios.vendor: Huawei
  dmi.bios.version: 1.50
  dmi.board.asset.tag: To be filled by O.E.M.
  dmi.board.name: BC11SPCD
  dmi.board.vendor: Huawei
  dmi.board.version: VER.A
  dmi.chassis.asset.tag: To be filled by O.E.M.
  dmi.chassis.type: 17
  dmi.chassis.vendor: Hisilicon
  dmi.chassis.version: To be filled by O.E.M.
  dmi.modalias: 
dmi:bvnHuawei:bvr1.50:bd06/01/2018:svnHisilicon:pnD05:pvrV100R001C00:rvnHuawei:rnBC11SPCD:rvrVER.A:cvnHisilicon:ct17:cvrTobefilledbyO.E.M.:
  dmi.product.family: To be filled by O.E.M.
  dmi.product.name: D05
  dmi.product.sku: To be filled by O.E.M.
  dmi.product.version: V100R001C00
  dmi.sys.vendor: Hisilicon

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/1958952/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to