Unfortunately, the system is unusable this morning. Still trying to recover it. May have to flatline it again.
It seems I have gotten myself stuck in a loop: 1. try to reboot and that causes kernel panic 2. after that happens a few times, the NVME needs fsck'd because of corrupt group descriptors 3. `fsck -CVvfy` the drive (twice for the ext partition and once for the EFI) 4. after doing 1-3 a few times, packages and symlinks start getting broken. I try to manually repair them until eventually I can't get into the system anymore. I tried to run memtest. If it is set to 1 cpu at a time, it goes without error until it eventually hangs on a random (inconsistent) test. If I run with all cpus, it shows tons of errors pretty quickly. Always on the same bit of every bank (ie: 80808080 -> 8080A080) and always off by two. But again, it doesn't do that unless multiple cpus are running at the same time. I thought it could be the other security features (interleaving, memory encryption, etc) that the BIOS has set to auto. Launching the live usb and just sitting at a terminal with `journalctl --follow`, the last thing that happens before it hangs is usually cleaning temp files; but I haven't run that enough to know if it is a pattern. >From the BIOS, I can set it to auto overclock or manual -- there is no option to disable overclocking; so I cleared the CMOS and tried again immediately after that without any change. I have attempted 44 bionic installs this month. 4 of those went through to completion. Two normal and two minimal. The rest failed during ubiquity. grub-install almost always succeeds when acpi=off and almost always hangs when it isn't. I also have to have pcie_aspm=off or the system is spammed with errors and crashes quickly. Others have reported the same thing for threadripper. I have tried with and without livepatch enabled. The system is stable when mining or gaming, and seems unstable when underutilized -- so I tried disabling the C-states in the BIOS. I have tried disabling every form of power management I could find in the OS and in the BIOS. I am sure I have missed quite a few. I have tried manually updating the kernel (per your requests) as well as using ukuu. Since it is my primary machine, I tend to have things installed that have to then be uninstalled for that to work well (like nvidia drivers, virtualbox, etc). I am seeing a ton of segfaults, even from the live usb. It more often happens when the machine is sitting idle for a few minutes (which is what had me thinking about power management). I thought it could be the memory, but since they don't fail memtest (if I run then 1 cpu at a time).... I know that "Erase disk and reinstall" will not solve the problem. It would be nice to figure out how to solve the problem before I do that again. So... I'm not sure how I can try a new kernel for you. If there is some way for me to update a live usb with an alternate kernel from a live usb; that might work since I see errors on the daily bionic iso as well. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1765838 Title: BUG: Bad rss-counter state mm:000000002ddfedce idx:2 val:-1 Status in linux package in Ubuntu: Incomplete Bug description: Booted. Started firefox. A couple seconds later it was back on the lock screen. Logged in again and it hung. This is a fresh "Erase disk and reinstall" of minimal from yesterdays bionic iso Ubuntu 4.15.0-15.16-generic 4.15.15 ProblemType: KernelOops DistroRelease: Ubuntu 18.04 Package: linux-image-4.15.0-15-generic 4.15.0-15.16 ProcVersionSignature: Ubuntu 4.15.0-15.16-generic 4.15.15 Uname: Linux 4.15.0-15-generic x86_64 NonfreeKernelModules: nvidia_modeset nvidia Annotation: Your system might become unstable now and might need to be restarted. ApportVersion: 2.20.9-0ubuntu5 Architecture: amd64 Date: Fri Apr 20 12:13:12 2018 Failure: oops InstallationDate: Installed on 2018-04-19 (0 days ago) InstallationMedia: MachineType: System manufacturer System Product Name OopsText: BUG: Bad rss-counter state mm:000000002ddfedce idx:2 val:-1 TaskSchedulerFo[35882]: segfault at 5cf3c85816d8 ip 0000557fa5ed3ed0 sp 00007ff500037420 error 4 in chrome[557fa4a32000+5cd4000] traps: wget[35886] general protection ip:7fbe54e7d2ff sp:7ffc7aa235a0 error:0 in ld-2.27.so[7fbe54e70000+27000] ProcFB: 0 EFI VGA ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-15-generic.efi.signed root=UUID=d9b05c55-71bb-4f4f-bdfd-43dd79de4c1d ro reboot=pci pcie_aspm=off PulseList: Error: command ['pacmd', 'list'] failed with exit code 1: No PulseAudio daemon running, or not running as session daemon. RelatedPackageVersions: kerneloops-daemon N/A SourcePackage: linux Title: BUG: Bad rss-counter state mm:000000002ddfedce idx:2 val:-1 UpgradeStatus: No upgrade log present (probably fresh install) dmi.bios.date: 12/21/2017 dmi.bios.vendor: American Megatrends Inc. dmi.bios.version: 0902 dmi.board.asset.tag: Default string dmi.board.name: ROG ZENITH EXTREME dmi.board.vendor: ASUSTeK COMPUTER INC. dmi.board.version: Rev 1.xx dmi.chassis.asset.tag: Default string dmi.chassis.type: 3 dmi.chassis.vendor: Default string dmi.chassis.version: Default string dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr0902:bd12/21/2017:svnSystemmanufacturer:pnSystemProductName:pvrSystemVersion:rvnASUSTeKCOMPUTERINC.:rnROGZENITHEXTREME:rvrRev1.xx:cvnDefaultstring:ct3:cvrDefaultstring: dmi.product.family: To be filled by O.E.M. dmi.product.name: System Product Name dmi.product.version: System Version dmi.sys.vendor: System manufacturer To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1765838/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : [email protected] Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp

