Hi Guido, Thanks for your reply!
On Wed, 17 Nov 2021 13:11:05 +0100 Guido Günther <g...@godiug.net> wrote: > here's the missing link: > > https://salsa.debian.org/libvirt-team/libvirt/-/merge_requests/116 I took the artifacts from there and did another test run. At first glance, everything looked good, but then it failed on destruction of the domain again, with: error: Unable to read from '/sys/fs/cgroup/machine.slice/machine- lxc\x2d4078\x2drouter.scope/system.slice/cgroup.controllers': No such file or directory The domain is afterward still fully cleaned up (the cgroup subtree under machine.slice is gone completely), but the error message is still annoying. To me it seems like the recursion in the libvirt code is racing against the domain cleaning up after itself somehow, because I get different error messages and sometimes it also succeeds: root@down ~ › virsh -c lxc:/// destroy router Domain 'router' destroyed root@down ~ › virsh -c lxc:/// start router Domain 'router' started root@down ~ › virsh -c lxc:/// destroy router error: Failed to destroy domain 'router' error: Unable to read from '/sys/fs/cgroup/machine.slice/machine- lxc\x2d4078\x2drouter.scope/system.slice/cgroup.controllers': No such file or directory root@down ~ › virsh -c lxc:/// start router Domain 'router' started root@down ~ › virsh -c lxc:/// destroy router error: Failed to destroy domain 'router' error: Unable to read from '/sys/fs/cgroup/machine.slice/machine- lxc\x2d1505\x2drouter.scope/system.slice/system-openvpn.slice/ cgroup.controllers': No such file or directory root@down ~ › dpkg -l | grep libvirt ii libvirt-clients 7.0.0-4~1.gbp7decb2 amd64 Programs for the libvirt library ii libvirt-daemon 7.0.0-4~1.gbp7decb2 amd64 Virtualization daemon ii libvirt-daemon-config-network 7.0.0-4~1.gbp7decb2 all Libvirt daemon configuration files (default network) ii libvirt-daemon-config-nwfilter 7.0.0-4~1.gbp7decb2 all Libvirt daemon configuration files (default network filters) ii libvirt-daemon-driver-lxc 7.0.0-4~1.gbp7decb2 amd64 Virtualization daemon LXC connection driver ii libvirt-daemon-driver-qemu 7.0.0-4~1.gbp7decb2 amd64 Virtualization daemon QEMU connection driver ii libvirt-daemon-system 7.0.0-4~1.gbp7decb2 amd64 Libvirt daemon configuration files ii libvirt-daemon-system-systemd 7.0.0-4~1.gbp7decb2 all Libvirt daemon configuration files (systemd) ii libvirt0:amd64 7.0.0-4~1.gbp7decb2 amd64 library for interfacing with different virtualization systems ii python3-libvirt 7.0.0-2 amd64 libvirt Python 3 bindings I took a brief look at the code, but at first glance I have no idea how to gracefully deal with this race. Unfortunately, a google search did not bring up anything useful either. Sometimes it takes many attempts (10+) to reproduce this :/. kind regards, Jonas > > > Cheers, > > -- Guido > > > > > Message-Id: <ea7d0ca37cce76e1327945c4864b996d7fd6d2e6.1618903455.git.mpriv...@redhat.com> > > > From: Michal Privoznik <mpriv...@redhat.com> > > > Date: Fri, 16 Apr 2021 16:39:14 +0200 > > > Subject: [PATCH] vircgroup: Fix virCgroupKillRecursive() wrt nested > > > controllers > > > MIME-Version: 1.0 > > > Content-Type: text/plain; charset=UTF-8 > > > Content-Transfer-Encoding: 8bit > > > > > > I've encountered the following bug, but only on Gentoo with > > > systemd and CGroupsV2. I've started an LXC container successfully > > > but destroying it reported the following error: > > > > > > error: Failed to destroy domain 'amd64' > > > error: internal error: failed to get cgroup backend for 'pathOfController' > > > > > > Debugging showed, that CGroup hierarchy is full of surprises: > > > > > > /sys/fs/cgroup/machine.slice/machine-lxc\x2d861\x2damd64.scope/ > > > └── libvirt > > > ├── dev-hugepages.mount > > > ├── dev-mqueue.mount > > > ├── init.scope > > > ├── sys-fs-fuse-connections.mount > > > ├── sys-kernel-config.mount > > > ├── sys-kernel-debug.mount > > > ├── sys-kernel-tracing.mount > > > ├── system.slice > > > │ ├── console-getty.service > > > │ ├── dbus.service > > > │ ├── system-getty.slice > > > │ ├── system-modprobe.slice > > > │ ├── systemd-journald.service > > > │ ├── systemd-logind.service > > > │ └── tmp.mount > > > └── user.slice > > > > > > For comparison, here's the same container on recent Rawhide: