I've tried a couple memory testers: - Userspace 'memtester', which passed overnight - Kernel's 'memtest' cmdline arg, which also passed earlyboot. Running stress-ng afterwards still reported errors.
Attached is the console log from the kernel 'memtest' run. Note that I saw 3 ECCs here, the last 2 do appear to be in userspace addresses: $ grep 'ECC error' console.log lundmark login: [13515.008197] Synchronous External Abort: synchronous parity or ECC error (0x86000018) at 0x0000aaabd45d5fff [13516.695278] Synchronous External Abort: synchronous parity or ECC error (0x86000018) at 0x0000ffffa12d2e90 [13525.127804] Synchronous External Abort: synchronous parity or ECC error (0x86000018) at 0x0000ffff80007fc8 ** Attachment added: "console.log" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1749685/+attachment/5122740/+files/console.log -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1749685 Title: Kernel panic on ThunderX Status in linux package in Ubuntu: Invalid Bug description: While doing testing on lundmark, i observed (from time to time) panics on 4.13.0-32.35~16.04.1-generic - i got this one while deploying the board: Booting under MAAS direction... [ grub.cfg-40:8d:5c:ba 606B 100% 1.56KiB/s ] EFI stub: Booting Linux Kernel... [ boot-initrd 46.78MiB 100% 6.57MiB/s ] EFI stub: EFI_RNG_PROTOCOL unavailable, no randomness supplied EFI stub: Using DTB from configuration table EFI stub: Exiting boot services and installing virtual address map... [ 0.000000] Booting Linux on physical CPU 0x0 [ 0.000000] random: get_random_bytes called from start_kernel+0x50/0x460 with crng_init=0 [ 0.000000] Linux version 4.13.0-32-generic (buildd@bos01-arm64-018) (gcc version 5.4.0 20160609 (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.5)) #35~16.04.1- Ubuntu SMP Thu Jan 25 10:10:26 UTC 2018 (Ubuntu 4.13.0-32.35~16.04.1-generic 4.13.13) [ 0.000000] Boot CPU: AArch64 Processor [431f0a11] [ 0.000000] Machine model: cavium,thunder-88xx [ 0.000000] efi: Getting EFI parameters from FDT: [ 0.000000] efi: EFI v2.40 by American Megatrends [ 0.000000] efi: ESRT=0x1ffce5ac18 SMBIOS 3.0=0x1ffce5a918 ACPI 2.0=0x1ffeb46000 [ 0.000000] esrt: Reserving ESRT space from 0x0000001ffce5ac18 to 0x0000001ffce5ac50. [ 0.000000] NUMA: NODE_DATA [mem 0x1fff0c4d00-0x1fff0c7fff] [ 0.000000] Zone ranges: [ 0.000000] DMA [mem 0x0000000000500000-0x00000000ffffffff] [ 0.000000] Normal [mem 0x0000000100000000-0x0000001fff0fffff] [ 0.000000] Movable zone start for each node [ 0.000000] Early memory node ranges ... [ 0.000000] Kernel command line: BOOT_IMAGE=ubuntu/arm64/hwe-16.04/xenial/daily/boot-kernel nomodeset root=squash:http://10.229.32.21:5248/images/ubu ntu/arm64/hwe-16.04/xenial/daily/squashfs ro ip=::::lundmark:BOOTIF ip6=off overlayroot=tmpfs overlayroot_cfgdisk=disabled cc:{datasource_list: [MAAS]}e nd_cc cloud-config-url=http://10.229.32.21:5240/MAAS/metadata/latest/by-id/ttctk4/?op=get_preseed apparmor=0 log_host=10.229.32.21 log_port=514 BOOTIF=0 1-40:8d:5c:ba:cd:d4 ... [ 9.058541] Synchronous External Abort: synchronous parity or ECC error (0x86000018) at 0x0000ffff9658fc9c [ 9.058545] Internal error: : 86000018 [#1] SMP [ 9.058548] Modules linked in: ast(+) i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect aes_ce_blk sysimgblt aes_ce_cipher fb_sys_fops crc32_ce crct10dif_ce drm ghash_ce sha2_ce sha1_ce ahci libahci thunder_bgx(+) i2c_thunderx(+) thunder_xcv i2c_smbus ipmi_ssif mdio_thunder thunderx_mmc mdio_ca vium ipmi_devintf ipmi_msghandler aes_neon_bs aes_neon_blk crypto_simd cryptd [ 9.058588] CPU: 7 PID: 0 Comm: swapper/7 Not tainted 4.13.0-32-generic #35~16.04.1-Ubuntu [ 9.058589] Hardware name: Cavium ThunderX CRB/To be filled by O.E.M., BIOS 5.11 12/12/2012 [ 9.058591] task: ffff801f700c6900 task.stack: ffff801f700cc000 [ 9.058600] PC is at __remove_hrtimer+0x48/0xa8 [ 9.058602] LR is at __remove_hrtimer+0x5c/0xa8 [ 9.058604] pc : [<ffff000008153d88>] lr : [<ffff000008153d9c>] pstate: 004001c5 [ 9.058606] sp : ffff801f79787e60 [ 9.058607] x29: ffff801f79787e60 x28: ffff801f700c6900 [ 9.058611] x27: 000000021bc7a0ca x26: ffff000008fcd000 [ 9.058614] x25: 0000000000000001 x24: ffff000008fcd000 [ 9.058617] x23: ffff0000093b9658 x22: ffff801f7978f598 [ 9.058620] x21: ffff801f7978f5c0 x20: ffff801f7978f580 [ 9.058624] x19: ffff801f7978fa00 x18: 0000ffffc8e8cb78 [ 9.058627] x17: 000000000000668a x16: 0000000000000000 [ 9.058630] x15: 0000ffff968adcc0 x14: 343030302c333030 [ 9.058633] x13: 302c323030302c31 x12: 3030302c30303030 [ 9.058636] x11: 0000aaaac635fa10 x10: 0000000000000b00 [ 9.058639] x9 : 0000000000000040 x8 : ffff801f780026f0 [ 9.058643] x7 : 0000000000000000 x6 : ffff801f7978fa00 [ 9.058646] x5 : 0000000000000000 x4 : ffff801f7978fb58 [ 9.058649] x3 : ffff801f7978fb58 x2 : ffff801f7978fb58 [ 9.058652] x1 : ffff801f7978f5d0 x0 : 0000000000000001 [ 9.058656] Process swapper/7 (pid: 0, stack limit = 0xffff801f700cc000) [ 9.058658] Stack: (0xffff801f79787e60 to 0xffff801f700d0000) [ 9.058660] Call trace: [ 9.058662] Exception stack(0xffff801f79787c70 to 0xffff801f79787da0) [ 9.058665] 7c60: ffff801f7978fa00 0001000000000000 [ 9.058668] 7c80: 000000000242d000 ffff000008153d88 00000000004001c5 ffff801f79787d38 [ 9.058671] 7ca0: ffff801f79787cd0 ffff0000081070a4 ffff801f79787cd0 ffff0000081070b4 [ 9.058674] 7cc0: ffff801f6f1b9e00 ffff0000093b8c08 ffff801f79787d50 ffff000008107318 [ 9.058677] 7ce0: ffff801f6f1b9e00 ffff801f79791c10 ffff801f79795020 0000000000000000 [ 9.058680] 7d00: 0000000000000100 0000000000000007 ffff000009555658 ffff0000093b8000 [ 9.058682] 7d20: ffff801f79787d50 0000000000040d00 0000000000000001 ffff801f7978f5d0 [ 9.058685] 7d40: ffff801f7978fb58 ffff801f7978fb58 ffff801f7978fb58 0000000000000000 [ 9.058688] 7d60: ffff801f7978fa00 0000000000000000 ffff801f780026f0 0000000000000040 [ 9.058691] 7d80: 0000000000000b00 0000aaaac635fa10 3030302c30303030 302c323030302c31 [ 9.058695] [<ffff000008153d88>] __remove_hrtimer+0x48/0xa8 [ 9.058697] [<ffff000008153f74>] __hrtimer_run_queues+0xbc/0x2a8 [ 9.058700] [<ffff000008154b08>] hrtimer_interrupt+0xa8/0x228 [ 9.058707] [<ffff0000088cde24>] arch_timer_handler_phys+0x3c/0x50 [ 9.058711] [<ffff00000813dbc4>] handle_percpu_devid_irq+0x8c/0x230 [ 9.058714] [<ffff000008137914>] generic_handle_irq+0x34/0x50 [ 9.058716] [<ffff000008138018>] __handle_domain_irq+0x68/0xc0 [ 9.058719] [<ffff0000080816ec>] gic_handle_irq+0xcc/0x188 [ 9.058721] Exception stack(0xffff801f700cfe00 to 0xffff801f700cff30) [ 9.058724] fe00: ffff000008fcd000 0000000000000000 0000000000000000 ffff000008fd6000 [ 9.058727] fe20: 0000801f707b7000 ffff801f700cff20 0000801f707b7000 ffff0000093b8698 [ 9.058729] fe40: 0000000000000000 ffff801f700cfe90 0000000000000b00 0000aaaac635fa10 [ 9.058732] fe60: 3030302c30303030 302c323030302c31 343030302c333030 0000ffff968adcc0 [ 9.058735] fe80: 0000000000000000 0000000000006688 0000ffffc8e8cb78 ffff000008fcd000 [ 9.058738] fea0: ffff0000093b9658 ffff0000093b9000 ffff000008fdd348 0000000000000000 [ 9.058741] fec0: 0000000000000000 ffff801f700c6900 0000000000000000 0000000000000000 [ 9.058743] fee0: 0000000000000000 ffff801f700cff30 ffff0000080859bc ffff801f700cff30 [ 9.058746] ff00: ffff0000080859c0 0000000000400145 ffff801f700cff20 ffff0000081489d8 [ 9.058748] ff20: ffffffffffffffff ffff000008148a54 [ 9.058751] [<ffff00000808315c>] el1_irq+0xdc/0x180 [ 9.058754] [<ffff0000080859c0>] arch_cpu_idle+0x30/0x168 [ 9.058760] [<ffff000008122bc4>] do_idle+0x114/0x1e0 [ 9.058763] [<ffff000008122e64>] cpu_startup_entry+0x2c/0x30 [ 9.058767] [<ffff000008092308>] secondary_start_kernel+0x108/0x118 [ 9.058770] [<00000000018781c4>] 0x18781c4 [ 9.058773] Code: 370000c0 a94153f3 a9425bf5 f9401bf7 (a8c47bfd) [ 9.058803] ---[ end trace 963acec48f21d263 ]--- [ 9.058805] Kernel panic - not syncing: Fatal exception in interrupt [ 9.058824] SMP: stopping secondary CPUs [ 9.059063] Kernel Offset: disabled [ 9.059066] CPU features: 0x101108 [ 9.059067] Memory Limit: none [ 9.656325] ---[ end Kernel panic - not syncing: Fatal exception in interrupt To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1749685/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp