Hello! Actually, we got a surprising behavior. Shortly after communication in this thread, the bug just disappeared, for nearly two months. Still had no luck reproducing.
We used this opportunity to migrate and reboot part of our servers to activate kdump on them, and decided to wait. A couple of days ago one of our hypervisors hung, and we got our crash kernel dump :) Kernel version was 5.4.0-73-generic this time. Now that we have it, could somebody please have a look at it? The file is quite large, ~2.5 GB (3.2 GB unpacked) https://drive.google.com/file/d/1JVMWJpXNeou06UxqJwl5wjbLKzcb2rOq/view?usp=sharing ** Changed in: linux (Ubuntu) Status: Incomplete => Confirmed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-hwe-5.4 in Ubuntu. https://bugs.launchpad.net/bugs/1921355 Title: cgroups related kernel panics Status in linux package in Ubuntu: Confirmed Status in linux-hwe-5.4 package in Ubuntu: Confirmed Bug description: Hi! Recently (throughout the last 6 months) we've upgraded our hypervisor compute hosts from ubuntu bionic kernel 4.15.* to ubuntu bionic hwe kernel 5.4. This month we noticed that several nodes failed due to bugs in cgroups. Trace was different almost every time, but it all revolves around cgroups - either null pointer failures, or panic caught by BUG_ON() macro. Looked like some cgroup didn't exist anymore but somebody tried to access it, thus causing kernel panic. Please find the logs attached. 3 of 4 cases happened after a VM shutdown. We tried to spawn lots of VMs, load them, shut them down, but didn't manage to reproduce the behavior. Actually, every case is sort of different - patch kernel versions (5.4.0-42 to 5.4.0-66), uptime vary (from 1 day to ~half a year). There are also lots of hosts with several months of uptime, no issue with them. Also, on 4.15 we've never seen this behavior, at all. That's quite disturbing, as I don't want dozens of VMs crash (due to host outage) at random times for some vague reason... I didn't manage to find any related bugs on the bug tracker, thus creating this one. I wonder if anybody in the community came across something like that. Could somebody give an advice how to debug further, or where else to report / look for a similar the case? To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1921355/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp