[Kernel-packages] [Bug 1867916] Re: Regression in kernel 4.15.0-91 causes kernel panic with Bcache

Mauricio Faria de Oliveira Wed, 08 Apr 2020 09:31:28 -0700

Ryan,

Part 1)
------


First, please try to reproduce the problem later, not so early in boot,
by disabling the bcache module on the kernel boot parameters, and then
loading it after the system has booted successfully.
(This should be possible as you mentioned the boot disk isn't involved.)

1) Edit '/etc/fstab' and either comment or add the 'noauto' option to
the mounts depending on bcache, so that systemd doesn't delay on boot.

For example,

$ sudo vim /etc/fstab
From: /dev/mapper/*whatadisk* /mountpoint ext4 defaults 0 0
To: /dev/mapper/*whatadisk* /mountpoint ext4 defaults,noauto 0 0
Esc, :x, Enter

2) Edit '/etc/default/grub' and add the 'modprobe.blacklist=bcache' option
to GRUB_CMDLINE_LINUX_DEFAULT.

For example,

$ sudo vim /etc/default/grub
From: GRUB_CMDLINE_LINUX_DEFAULT="console=ttyS0"
To: GRUB_CMDLINE_LINUX_DEFAULT="console=ttyS0 modprobe.blacklist=bcache"
Esc, :x, Enter

Update and check grub config:

$ sudo update-grub

$ grep modprobe.blacklist=bcache /boot/grub/grub.cfg 
                linux   /boot/vmlinuz-4.15.0-91-generic ... 
modprobe.blacklist=bcache 
                linux   /boot/vmlinuz-4.15.0-88-generic ... 
modprobe.blacklist=bcache 

3) Reboot the system in 4.15.0-91, it should not fail, as bcache is not
loaded.

4) Now load bcache, retrigger device events, and check if the problem
reproduces.

$ sudo modprobe bcache
$ sudo udevadm trigger

This should register the bcache devices, e.g., /dev/bcache0.

If you can see /dev/bcache0 and the problem did NOT happen,
please stop here and let me know.

If the problem reproduced, please proceed after your system 
rebooted (it should boot normally as it has bcache disabled.)

...

Part 2)
------

1) Install linux-crashdump:

$ sudo apt install linux-crashdump

Answer these questions:

- Should kexec-tools handle reboots (sysvinit only)? No
- Should kdump-tools be enabled by default? Yes

2) Increase the reserved memory size for the crashdump kernel:

Edit '/etc/default/grub.d/kdump-tools.cfg' and change the crashkernel
size from 192M to 512M or 768M if possible:

For example,

$ sudo vim /etc/default/grub.d/kdump-tools.cfg
from: GRUB_CMDLINE_LINUX_DEFAULT="$GRUB_CMDLINE_LINUX_DEFAULT 
crashkernel=512M-:192M"
to: GRUB_CMDLINE_LINUX_DEFAULT="$GRUB_CMDLINE_LINUX_DEFAULT 
crashkernel=512M-:768M"
Esc, :x, Enter

4) Update grub and reboot

$ sudo update-grub
$ sudo reboot

5) Check kdump status is 'ready' and that panic_on_oops is enabled (1)
by default:

$ sudo kdump-config status
current state:    ready to kdump

$ cat /proc/sys/kernel/panic_on_oops 
1

6) Trigger a test crashdump

$ echo 1 | sudo tee /proc/sys/kernel/sysrq
$ echo c | sudo tee /proc/sysrq-trigger

This apparently 'reboots' the system, and collects a memory dump:

[    8.510809] kdump-tools[781]: Starting kdump-tools:  * running makedumpfile 
-c -d 31 /proc/vmcore /var/crash/202004081540/dump-incomplet$
...
Copying data                                      : [100.0 %] -           eta: 
0s
...
[   15.964149] kdump-tools[781]:  * kdump-tools: saved vmcore in 
/var/crash/202004081540
...
[   16.176388] kdump-tools[781]:  * kdump-tools: saved dmesg content in 
/var/crash/202004081540
...
[   17.187848] kdump-tools[781]: Rebooting.
...

7) After the system boots again, check the crashdump is stored in
/var/crash/<timestamp>

$ ls -1 /var/crash/202004081540
dmesg.202004081540
dump.202004081540

If this didn't happen, please stop and let me know, so we can fix the
crashdump mechanism.

If you have /var/crash/<timestamp>, the crashdump is working, let's move 
forward.
Feel free to remove that directory, $ sudo rm -rf /var/crash/<timestamp>

...

8) Boot again and reproduce the problem.

Again, boot in 4.15.0-91, and reproduce the problem manually as in step
4 in Part 1.

And this should generate a crashdump in /var/crash, as in the test crashdump.
Please create a tarball and attach it to Launchpad.

$ sudo tar cvf lp1867916-crashdump.tar /var/crash/<timestamp>

If there are attachment size limit issues, please let me know, or use
another hosting website, if at all possible.

Thank you very much,
Mauricio

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1867916

Title:
  Regression in kernel 4.15.0-91 causes kernel panic with Bcache

Status in linux package in Ubuntu:
  In Progress

Bug description:
  After upgrading from kernel 4.15.0-88 to 4.15.0-91 one of our systems
  does not boot any longer. It always crashes during boot with a kernel
  panic.

  I suspect that this crash might be related to Bcache because this is
  the only one of our systems where we use Bcache and the kernel panic
  appears right after Bcache initialization.

  I already checked that this bug still exists in the 4.15.0-92.93
  kernel from proposed.

  Unfortunately, I cannot do a bisect because this is a critical
  production system and we do not have any other system with a similar
  configuration.

  I attached a screenshot with the trace of the kernel panic.

  The last message that appears before the kernel panic (or rather the
  last one that I can see - there is a rather long pause between that
  message and the panic and I cannot scroll up far enough to ensure that
  there are no other messages in between) is:

  bcache: register_bcache() error /dev/dm-0: device already registered

  When booting with kernel 4.15.0-88 that does not have this problem,
  the next message is

  bcache: register_bcache() error /dev/dm-12: device already registered
  (emitting change event)

  After that the next message is:

  Begin: Loading essential drivers ... done

  This message also appears after the kernel panic, but the boot process
  stalls and the system can only be recovered by doing a hardware reset.

  ProblemType: Bug
  DistroRelease: Ubuntu 18.04
  Package: linux-image-4.15.0-88-generic 4.15.0-88.88
  ProcVersionSignature: Ubuntu 4.15.0-88.88-generic 4.15.18
  Uname: Linux 4.15.0-88-generic x86_64
  AlsaDevices:
   total 0
   crw-rw---- 1 root audio 116,  1 Mar 17 21:08 seq
   crw-rw---- 1 root audio 116, 33 Mar 17 21:08 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
  ApportVersion: 2.20.9-0ubuntu7.11
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 
'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  Date: Wed Mar 18 12:55:18 2020
  HibernationDevice: RESUME=UUID=40512ea2-9fce-40f5-8362-5daf955cc26a
  InstallationDate: Installed on 2013-07-02 (2450 days ago)
  InstallationMedia: Ubuntu-Server 12.04.2 LTS "Precise Pangolin" - Release 
amd64 (20130214)
  MachineType: HP ProLiant DL160 G6
  PciMultimedia:
   
  ProcFB: 0 mgadrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.15.0-88-generic 
root=/dev/mapper/vg0-root ro nosmt nomdmonddf nomdmonisw nomdmonddf nomdmonisw 
nomdmonddf nomdmonisw nomdmonddf nomdmonisw nomdmonddf nomdmonisw
  RelatedPackageVersions:
   linux-restricted-modules-4.15.0-88-generic N/A
   linux-backports-modules-4.15.0-88-generic  N/A
   linux-firmware                             1.173.16
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
  SourcePackage: linux
  UpgradeStatus: Upgraded to bionic on 2018-09-23 (541 days ago)
  dmi.bios.date: 11/06/2009
  dmi.bios.vendor: HP
  dmi.bios.version: O33
  dmi.chassis.asset.tag: 0191525
  dmi.chassis.type: 23
  dmi.chassis.vendor: HP
  dmi.modalias: 
dmi:bvnHP:bvrO33:bd11/06/2009:svnHP:pnProLiantDL160G6:pvr:cvnHP:ct23:cvr:
  dmi.product.name: ProLiant DL160 G6
  dmi.sys.vendor: HP

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1867916/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1867916] Re: Regression in kernel 4.15.0-91 causes kernel panic with Bcache

Reply via email to