Ryan, Part 1) ------
First, please try to reproduce the problem later, not so early in boot, by disabling the bcache module on the kernel boot parameters, and then loading it after the system has booted successfully. (This should be possible as you mentioned the boot disk isn't involved.) 1) Edit '/etc/fstab' and either comment or add the 'noauto' option to the mounts depending on bcache, so that systemd doesn't delay on boot. For example, $ sudo vim /etc/fstab From: /dev/mapper/*whatadisk* /mountpoint ext4 defaults 0 0 To: /dev/mapper/*whatadisk* /mountpoint ext4 defaults,noauto 0 0 Esc, :x, Enter 2) Edit '/etc/default/grub' and add the 'modprobe.blacklist=bcache' option to GRUB_CMDLINE_LINUX_DEFAULT. For example, $ sudo vim /etc/default/grub From: GRUB_CMDLINE_LINUX_DEFAULT="console=ttyS0" To: GRUB_CMDLINE_LINUX_DEFAULT="console=ttyS0 modprobe.blacklist=bcache" Esc, :x, Enter Update and check grub config: $ sudo update-grub $ grep modprobe.blacklist=bcache /boot/grub/grub.cfg linux /boot/vmlinuz-4.15.0-91-generic ... modprobe.blacklist=bcache linux /boot/vmlinuz-4.15.0-88-generic ... modprobe.blacklist=bcache 3) Reboot the system in 4.15.0-91, it should not fail, as bcache is not loaded. 4) Now load bcache, retrigger device events, and check if the problem reproduces. $ sudo modprobe bcache $ sudo udevadm trigger This should register the bcache devices, e.g., /dev/bcache0. If you can see /dev/bcache0 and the problem did NOT happen, please stop here and let me know. If the problem reproduced, please proceed after your system rebooted (it should boot normally as it has bcache disabled.) ... Part 2) ------ 1) Install linux-crashdump: $ sudo apt install linux-crashdump Answer these questions: - Should kexec-tools handle reboots (sysvinit only)? No - Should kdump-tools be enabled by default? Yes 2) Increase the reserved memory size for the crashdump kernel: Edit '/etc/default/grub.d/kdump-tools.cfg' and change the crashkernel size from 192M to 512M or 768M if possible: For example, $ sudo vim /etc/default/grub.d/kdump-tools.cfg from: GRUB_CMDLINE_LINUX_DEFAULT="$GRUB_CMDLINE_LINUX_DEFAULT crashkernel=512M-:192M" to: GRUB_CMDLINE_LINUX_DEFAULT="$GRUB_CMDLINE_LINUX_DEFAULT crashkernel=512M-:768M" Esc, :x, Enter 4) Update grub and reboot $ sudo update-grub $ sudo reboot 5) Check kdump status is 'ready' and that panic_on_oops is enabled (1) by default: $ sudo kdump-config status current state: ready to kdump $ cat /proc/sys/kernel/panic_on_oops 1 6) Trigger a test crashdump $ echo 1 | sudo tee /proc/sys/kernel/sysrq $ echo c | sudo tee /proc/sysrq-trigger This apparently 'reboots' the system, and collects a memory dump: [ 8.510809] kdump-tools[781]: Starting kdump-tools: * running makedumpfile -c -d 31 /proc/vmcore /var/crash/202004081540/dump-incomplet$ ... Copying data : [100.0 %] - eta: 0s ... [ 15.964149] kdump-tools[781]: * kdump-tools: saved vmcore in /var/crash/202004081540 ... [ 16.176388] kdump-tools[781]: * kdump-tools: saved dmesg content in /var/crash/202004081540 ... [ 17.187848] kdump-tools[781]: Rebooting. ... 7) After the system boots again, check the crashdump is stored in /var/crash/<timestamp> $ ls -1 /var/crash/202004081540 dmesg.202004081540 dump.202004081540 If this didn't happen, please stop and let me know, so we can fix the crashdump mechanism. If you have /var/crash/<timestamp>, the crashdump is working, let's move forward. Feel free to remove that directory, $ sudo rm -rf /var/crash/<timestamp> ... 8) Boot again and reproduce the problem. Again, boot in 4.15.0-91, and reproduce the problem manually as in step 4 in Part 1. And this should generate a crashdump in /var/crash, as in the test crashdump. Please create a tarball and attach it to Launchpad. $ sudo tar cvf lp1867916-crashdump.tar /var/crash/<timestamp> If there are attachment size limit issues, please let me know, or use another hosting website, if at all possible. Thank you very much, Mauricio -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1867916 Title: Regression in kernel 4.15.0-91 causes kernel panic with Bcache Status in linux package in Ubuntu: In Progress Bug description: After upgrading from kernel 4.15.0-88 to 4.15.0-91 one of our systems does not boot any longer. It always crashes during boot with a kernel panic. I suspect that this crash might be related to Bcache because this is the only one of our systems where we use Bcache and the kernel panic appears right after Bcache initialization. I already checked that this bug still exists in the 4.15.0-92.93 kernel from proposed. Unfortunately, I cannot do a bisect because this is a critical production system and we do not have any other system with a similar configuration. I attached a screenshot with the trace of the kernel panic. The last message that appears before the kernel panic (or rather the last one that I can see - there is a rather long pause between that message and the panic and I cannot scroll up far enough to ensure that there are no other messages in between) is: bcache: register_bcache() error /dev/dm-0: device already registered When booting with kernel 4.15.0-88 that does not have this problem, the next message is bcache: register_bcache() error /dev/dm-12: device already registered (emitting change event) After that the next message is: Begin: Loading essential drivers ... done This message also appears after the kernel panic, but the boot process stalls and the system can only be recovered by doing a hardware reset. ProblemType: Bug DistroRelease: Ubuntu 18.04 Package: linux-image-4.15.0-88-generic 4.15.0-88.88 ProcVersionSignature: Ubuntu 4.15.0-88.88-generic 4.15.18 Uname: Linux 4.15.0-88-generic x86_64 AlsaDevices: total 0 crw-rw---- 1 root audio 116, 1 Mar 17 21:08 seq crw-rw---- 1 root audio 116, 33 Mar 17 21:08 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay' ApportVersion: 2.20.9-0ubuntu7.11 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: Date: Wed Mar 18 12:55:18 2020 HibernationDevice: RESUME=UUID=40512ea2-9fce-40f5-8362-5daf955cc26a InstallationDate: Installed on 2013-07-02 (2450 days ago) InstallationMedia: Ubuntu-Server 12.04.2 LTS "Precise Pangolin" - Release amd64 (20130214) MachineType: HP ProLiant DL160 G6 PciMultimedia: ProcFB: 0 mgadrmfb ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.15.0-88-generic root=/dev/mapper/vg0-root ro nosmt nomdmonddf nomdmonisw nomdmonddf nomdmonisw nomdmonddf nomdmonisw nomdmonddf nomdmonisw nomdmonddf nomdmonisw RelatedPackageVersions: linux-restricted-modules-4.15.0-88-generic N/A linux-backports-modules-4.15.0-88-generic N/A linux-firmware 1.173.16 RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill' SourcePackage: linux UpgradeStatus: Upgraded to bionic on 2018-09-23 (541 days ago) dmi.bios.date: 11/06/2009 dmi.bios.vendor: HP dmi.bios.version: O33 dmi.chassis.asset.tag: 0191525 dmi.chassis.type: 23 dmi.chassis.vendor: HP dmi.modalias: dmi:bvnHP:bvrO33:bd11/06/2009:svnHP:pnProLiantDL160G6:pvr:cvnHP:ct23:cvr: dmi.product.name: ProLiant DL160 G6 dmi.sys.vendor: HP To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1867916/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp