[Kernel-packages] [Bug 2058557] Re: Kernel panic during checkbox stress_ng_test on Grace running noble 6.8 (arm64+largemem) kernel

Mitchell Augustin Wed, 20 Mar 2024 16:11:50 -0700

This is also reproducible on the latest mainline version
(https://kernel.ubuntu.com/mainline/v6.8/arm64/, retrieved 20 Mar 2024 @
5 PM):


20 Mar 22:54: Running stress-ng aiol stressor for 240 seconds...
[  354.451450] Unable to handle kernel paging request at virtual address 
17be9b4aa3e187be
[  354.459580] Mem abort info:
[  354.462439]   ESR = 0x0000000096000021
[  354.466274]   EC = 0x25: DABT (current EL), IL = 32 bits
[  354.471703]   SET = 0, FnV = 0
[  354.474819]   EA = 0, S1PTW = 0
[  354.478024]   FSC = 0x21: alignment fault
[  354.482118] Data abort info:
[  354.485056]   ISV = 0, ISS = 0x00000021, ISS2 = 0x00000000
[  354.490662]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[  354.495823]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[  354.501251] [17be9b4aa3e187be] address between user and kernel address ranges
[  354.508548] Internal error: Oops: 0000000096000021 [#1] SMP
[  354.514245] Modules linked in: qrtr cfg80211 binfmt_misc nls_iso8859_1 
input_leds dax_hmem cxl_acpi acpi_ipmi onboard_usb_hub nvidia_cspmu ipmi_ssif 
cxl_co
re ipmi_devintf arm_cspmu_module arm_smmuv3_pmu ipmi_msghandler uio_pdrv_genirq 
uio spi_nor cppc_cpufreq joydev mtd acpi_power_meter dm_multipath nvme_fabrics
 efi_pstore nfnetlink dmi_sysfs ip_tables x_tables autofs4 btrfs 
blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq 
async_xor async_tx xor
 xor_neon raid6_pq libcrc32c raid1 raid0 hid_generic rndis_host usbhid 
cdc_ether hid usbnet uas usb_storage crct10dif_ce polyval_ce polyval_generic 
ghash_ce s
m4_ce_gcm sm4_ce_ccm sm4_ce sm4_ce_cipher sm4 sm3_ce sm3 nvme sha3_ce i2c_smbus 
ixgbe sha2_ce nvme_core ast sha256_arm64 xhci_pci sha1_ce xfrm_algo xhci_pci_r
enesas i2c_algo_bit nvme_auth mdio spi_tegra210_quad i2c_tegra aes_neon_bs 
aes_neon_blk aes_ce_blk aes_ce_cipher
[  354.594676] CPU: 61 PID: 0 Comm: swapper/61 Kdump: loaded Not tainted 
6.8.0-060800-generic-64k #202403131158
[  354.604728] Hardware name: Supermicro MBD-G1SMH/G1SMH, BIOS 1.0c 12/28/2023
[  354.611844] pstate: 034000c9 (nzcv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
[  354.618962] pc : _raw_spin_lock_irqsave+0x44/0x100
[  354.623863] lr : try_to_wake_up+0x68/0x758
[  354.628053] sp : ffff8000807afaf0
[  354.631436] x29: ffff8000807afaf0 x28: 0000000000040000 x27: 0000000000000000
[  354.638731] x26: ffffa06103dc8a98 x25: ffff8000807afd98 x24: 0000000000000002
[  354.646027] x23: ffff0000f8156840 x22: 17be9b4aa3e187be x21: 0000000000000000
[  354.653323] x20: 0000000000000003 x19: 00000000000000c0 x18: ffff8000819a0098
[  354.660619] x17: 0000000000000000 x16: 0000000000000000 x15: 0000ffffe97dca18
[  354.667914] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
[  354.675208] x11: 0000000000000000 x10: 0000000000000000 x9 : ffffa06100ba6810
[  354.682504] x8 : 0000000000000000 x7 : 0000004000000000 x6 : 0000000000009080
[  354.689800] x5 : 0000c2fb0dc488b0 x4 : 0000000000000000 x3 : ffff0000894178c0
[  354.697096] x2 : 0000000000000001 x1 : 0000000000000000 x0 : 17be9b4aa3e187be
[  354.704391] Call trace:
[  354.706886]  _raw_spin_lock_irqsave+0x44/0x100
[  354.711426]  try_to_wake_up+0x68/0x758
[  354.715254]  wake_up_process+0x24/0x50
[  354.719082]  aio_complete+0x1c4/0x2b8
[  354.722825]  aio_complete_rw+0x11c/0x2c8
[  354.726831]  iomap_dio_bio_end_io+0x1f0/0x248
[  354.731282]  bio_endio+0x170/0x270
[  354.734758]  __dm_io_complete+0x180/0x200
[  354.738855]  clone_endio+0xc8/0x288
[  354.742416]  bio_endio+0x170/0x270
[  354.745889]  blk_mq_end_request_batch+0x2e0/0x558
[  354.750696]  nvme_pci_complete_batch+0x94/0x118 [nvme]
[  354.755958]  nvme_irq+0x9c/0xb0 [nvme]
[  354.759788]  __handle_irq_event_percpu+0x68/0x2c0
[  354.764595]  handle_irq_event+0x58/0xe8
[  354.768511]  handle_fasteoi_irq+0xb0/0x218
[  354.772695]  generic_handle_domain_irq+0x38/0x70
[  354.777411]  __gic_handle_irq_from_irqson.isra.0+0x180/0x310
[  354.783195]  gic_handle_irq+0x2c/0xa0
[  354.786935]  call_on_irq_stack+0x3c/0x50
[  354.790941]  do_interrupt_handler+0xb0/0xc8
[  354.795214]  el1_interrupt+0x48/0xf0
[  354.798866]  el1h_64_irq_handler+0x1c/0x40
[  354.803050]  el1h_64_irq+0x7c/0x80
[  354.806523]  cpuidle_enter_state+0xd8/0x790
[  354.810795]  cpuidle_enter+0x44/0x78
[  354.814446]  cpuidle_idle_call+0x15c/0x210
[  354.818631]  do_idle+0xb0/0x130
[  354.821837]  cpu_startup_entry+0x44/0x50
[  354.825845]  secondary_start_kernel+0xec/0x130
[  354.830386]  __secondary_switched+0xc0/0xc8
[  354.834661] Code: b9001041 d503201f 52800001 52800022 (88e17c02) 
[  354.840893] SMP: stopping secondary CPUs
[  355.897569] SMP: failed to stop secondary CPUs 0-60,62-143
[  355.904206] Starting crashdump kernel...
[  355.908214] ------------[ cut here ]------------
[  355.912930] Some CPUs may be stale, kdump will be unreliable.
[  355.918807] WARNING: CPU: 61 PID: 0 at arch/arm64/kernel/machine_kexec.c:174 
machine_kexec+0x48/0x1f0
[  355.928236] Modules linked in: qrtr cfg80211 binfmt_misc nls_iso8859_1 
input_leds dax_hmem cxl_acpi acpi_ipmi onboard_usb_hub nvidia_cspmu ipmi_ssif 
cxl_co
re ipmi_devintf arm_cspmu_module arm_smmuv3_pmu ipmi_msghandler uio_pdrv_genirq 
uio spi_nor cppc_cpufreq joydev mtd acpi_power_meter dm_multipath nvme_fabrics
 efi_pstore nfnetlink dmi_sysfs ip_tables x_tables autofs4 btrfs 
blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq 
async_xor async_tx xor
 xor_neon raid6_pq libcrc32c raid1 raid0 hid_generic rndis_host usbhid 
cdc_ether hid usbnet uas usb_storage crct10dif_ce polyval_ce polyval_generic 
ghash_ce s
m4_ce_gcm sm4_ce_ccm sm4_ce sm4_ce_cipher sm4 sm3_ce sm3 nvme sha3_ce i2c_smbus 
ixgbe sha2_ce nvme_core ast sha256_arm64 xhci_pci sha1_ce xfrm_algo xhci_pci_r
enesas i2c_algo_bit nvme_auth mdio spi_tegra210_quad i2c_tegra aes_neon_bs 
aes_neon_blk aes_ce_blk aes_ce_cipher
[  356.008649] CPU: 61 PID: 0 Comm: swapper/61 Kdump: loaded Not tainted 
6.8.0-060800-generic-64k #202403131158
[  356.018699] Hardware name: Supermicro MBD-G1SMH/G1SMH, BIOS 1.0c 12/28/2023
[  356.025815] pstate: 634000c9 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
[  356.032932] pc : machine_kexec+0x48/0x1f0
[  356.037027] lr : machine_kexec+0x48/0x1f0
[  356.041121] sp : ffff8000807af620
[  356.044504] x29: ffff8000807af620 x28: ffff0000894178c0 x27: 0000000000000000
[  356.051800] x26: ffffa06102735fd8 x25: 00000000000000c0 x24: 0000000000000000
[  356.059096] x23: ffffa0610273afc0 x22: ffffa061043200f8 x21: ffffa0610437a000
[  356.066393] x20: ffff0000d13db400 x19: ffff0000d13db400 x18: ffff8000819a00e8
[  356.073688] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000040000
[  356.080983] x14: 0000000000000000 x13: 2e656c6261696c65 x12: 726e75206562206c
[  356.088279] x11: 6c697720706d7564 x10: 0000000000000000 x9 : 0000000000000000
[  356.095574] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
[  356.102871] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
[  356.110166] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000
[  356.117463] Call trace:
[  356.119957]  machine_kexec+0x48/0x1f0
[  356.123698]  __crash_kexec+0x94/0x128
[  356.127440]  crash_kexec+0x4c/0xb8
[  356.130913]  die+0x27c/0x2c8
[  356.133853]  die_kernel_fault+0x110/0x1d0
[  356.137948]  __do_kernel_fault+0x1e4/0x200
[  356.142133]  do_alignment_fault+0x90/0xc8
[  356.146228]  do_mem_abort+0x50/0xd0
[  356.149789]  el1_abort+0x50/0xd8
[  356.153086]  el1h_64_sync_handler+0x114/0x1c0
[  356.157536]  el1h_64_sync+0x7c/0x80
[  356.161098]  _raw_spin_lock_irqsave+0x44/0x100
[  356.165636]  try_to_wake_up+0x68/0x758
[  356.169466]  wake_up_process+0x24/0x50
[  356.173295]  aio_complete+0x1c4/0x2b8
[  356.177037]  aio_complete_rw+0x11c/0x2c8
[  356.181042]  iomap_dio_bio_end_io+0x1f0/0x248
[  356.185494]  bio_endio+0x170/0x270
[  356.188969]  __dm_io_complete+0x180/0x200
[  356.193066]  clone_endio+0xc8/0x288
[  356.196627]  bio_endio+0x170/0x270
[  356.200101]  blk_mq_end_request_batch+0x2e0/0x558
[  356.204909]  nvme_pci_complete_batch+0x94/0x118 [nvme]
[  356.210164]  nvme_irq+0x9c/0xb0 [nvme]
[  356.213995]  __handle_irq_event_percpu+0x68/0x2c0
[  356.218802]  handle_irq_event+0x58/0xe8
[  356.222718]  handle_fasteoi_irq+0xb0/0x218
[  356.226903]  generic_handle_domain_irq+0x38/0x70
[  356.231619]  __gic_handle_irq_from_irqson.isra.0+0x180/0x310
[  356.237405]  gic_handle_irq+0x2c/0xa0
[  356.241144]  call_on_irq_stack+0x3c/0x50
[  356.245150]  do_interrupt_handler+0xb0/0xc8
[  356.249424]  el1_interrupt+0x48/0xf0
[  356.253074]  el1h_64_irq_handler+0x1c/0x40
[  356.257258]  el1h_64_irq+0x7c/0x80
[  356.260731]  cpuidle_enter_state+0xd8/0x790
[  356.265003]  cpuidle_enter+0x44/0x78
[  356.268655]  cpuidle_idle_call+0x15c/0x210
[  356.272841]  do_idle+0xb0/0x130
[  356.276048]  cpu_startup_entry+0x44/0x50
[  356.280053]  secondary_start_kernel+0xec/0x130
[  356.284594]  __secondary_switched+0xc0/0xc8
[  356.288868] ---[ end trace 0000000000000000 ]---
[  356.293585] Bye!

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2058557

Title:
  Kernel panic during checkbox stress_ng_test on Grace running noble 6.8
  (arm64+largemem) kernel

Status in linux package in Ubuntu:
  New

Bug description:
  A kernel oops and panic occurred during 22.04 SoC certification on
  Gunyolk (Grace/Grace) with 6.8 kernel, arm64+largemem variant

  Steps to reproduce:
  Run (as root) the following commands:

  add-apt-repository -y ppa:checkbox-dev/stable
  apt-add-repository -y ppa:firmware-testing-team/ppa-fwts-stable
  apt update
  apt install -y canonical-certification-server
  /usr/lib/checkbox-provider-base/bin/stress_ng_test.py disk --device dm-0 
--base-time 240

  stress_ng_test caused a kernel panic after about 5 minutes. I have
  attached dmesg output from my reproducer to this report.

  Initially, this was identified via a panic during the above test,
  which was running as part of a run of certify-soc-22.04.

  Attached is a tarball containing:

  - apport.linux-image-6.8.0-11-generic-64k.kzsondji.apport: The output of 
`ubuntu-bug linux` on the machine (after reboot)
  - reproduced-dmesg.202403201942: The dmesg output captured by kdump when I 
reproduced my original issue by running only the single stress_ng_test.py 
command above (not the entire cert suite)
  - original-dmesg.txt: The dmesg output I captured when the stress_ng_test 
originally failed during the full cert suite run

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2058557/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 2058557] Re: Kernel panic during checkbox stress_ng_test on Grace running noble 6.8 (arm64+largemem) kernel

Reply via email to