Public bug reported:
Hi!
Recently (throughout the last 6 months) we've upgraded our hypervisor
compute hosts from ubuntu bionic kernel 4.15.* to ubuntu bionic hwe
kernel 5.4.
This month we noticed that several nodes failed due to bugs in cgroups.
Trace was different almost every time, but it all revolves around cgroups -
either null pointer failures, or panic caught by BUG_ON() macro. Looked like
some cgroup didn't exist anymore but somebody tried to access it, thus causing
kernel panic.
Please find the logs attached.
3 of 4 cases happened after a VM shutdown. We tried to spawn lots of VMs, load
them, shut them down, but didn't manage to reproduce the behavior.
Actually, every case is sort of different - patch kernel versions (5.4.0-42 to
5.4.0-66), uptime vary (from 1 day to ~half a year). There are also lots of
hosts with several months of uptime, no issue with them. Also, on 4.15 we've
never seen this behavior, at all.
That's quite disturbing, as I don't want dozens of VMs crash (due to host
outage) at random times for some vague reason...
I didn't manage to find any related bugs on the bug tracker, thus creating this
one.
I wonder if anybody in the community came across something like that.
Could somebody give an advice how to debug further, or where else to report /
look for a similar the case?
** Affects: linux-hwe-5.4 (Ubuntu)
Importance: Undecided
Status: New
** Tags: cgroups
** Attachment added: "crash-030321.log"
https://bugs.launchpad.net/bugs/1921355/+attachment/5480836/+files/crash-030321.log
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1921355
Title:
cgroups related kernel panics
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-hwe-5.4/+bug/1921355/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs