Hello!

Actually, we got a surprising behavior.
Shortly after communication in this thread, the bug just disappeared, for 
nearly two months.
Still had no luck reproducing.

We used this opportunity to migrate and reboot part of our servers to activate 
kdump on them, and decided to wait.
A couple of days ago one of our hypervisors hung, and we got our crash kernel 
dump :)
Kernel version was 5.4.0-73-generic this time.

Now that we have it, could somebody please have a look at it?
The file is quite large, ~2.5 GB (3.2 GB unpacked)
https://drive.google.com/file/d/1JVMWJpXNeou06UxqJwl5wjbLKzcb2rOq/view?usp=sharing

** Changed in: linux (Ubuntu)
       Status: Incomplete => Confirmed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-hwe-5.4 in Ubuntu.
https://bugs.launchpad.net/bugs/1921355

Title:
  cgroups related kernel panics

Status in linux package in Ubuntu:
  Confirmed
Status in linux-hwe-5.4 package in Ubuntu:
  Confirmed

Bug description:
  Hi!

  Recently (throughout the last 6 months) we've upgraded our hypervisor
  compute hosts from ubuntu bionic kernel 4.15.* to ubuntu bionic hwe
  kernel 5.4.

  This month we noticed that several nodes failed due to bugs in cgroups.
  Trace was different almost every time, but it all revolves around cgroups - 
either null pointer failures, or panic caught by BUG_ON() macro. Looked like 
some cgroup didn't exist anymore but somebody tried to access it, thus causing 
kernel panic.
  Please find the logs attached.

  3 of 4 cases happened after a VM shutdown. We tried to spawn lots of VMs, 
load them, shut them down, but didn't manage to reproduce the behavior.
  Actually, every case is sort of different - patch kernel versions (5.4.0-42 
to 5.4.0-66), uptime vary (from 1 day to ~half a year). There are also lots of 
hosts with several months of uptime, no issue with them. Also, on 4.15 we've 
never seen this behavior, at all.
  That's quite disturbing, as I don't want dozens of VMs crash (due to host 
outage) at random times for some vague reason...
  I didn't manage to find any related bugs on the bug tracker, thus creating 
this one.

  I wonder if anybody in the community came across something like that.
  Could somebody give an advice how to debug further, or where else to report / 
look for a similar the case?

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1921355/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to