Hi Guido,

Thanks for your reply!

On Wed, 17 Nov 2021 13:11:05 +0100 Guido Günther <g...@godiug.net> wrote:
> here's the missing link:
> 
> https://salsa.debian.org/libvirt-team/libvirt/-/merge_requests/116

I took the artifacts from there and did another test run. At first glance, 
everything looked good, but then it failed on destruction of the domain again, 
with:

error: Unable to read from '/sys/fs/cgroup/machine.slice/machine-
lxc\x2d4078\x2drouter.scope/system.slice/cgroup.controllers': No such file or 
directory

The domain is afterward still fully cleaned up (the cgroup subtree under 
machine.slice is gone completely), but the error message is still annoying. To 
me it seems like the recursion in the libvirt code is racing against the 
domain cleaning up after itself somehow, because I get different error messages 
and sometimes it also succeeds:

root@down ~ › virsh -c lxc:/// destroy router
Domain 'router' destroyed

root@down ~ › virsh -c lxc:/// start router
Domain 'router' started

root@down ~ › virsh -c lxc:/// destroy router
error: Failed to destroy domain 'router'
error: Unable to read from '/sys/fs/cgroup/machine.slice/machine-
lxc\x2d4078\x2drouter.scope/system.slice/cgroup.controllers': No such file or 
directory

root@down ~ › virsh -c lxc:/// start router
Domain 'router' started

root@down ~ › virsh -c lxc:/// destroy router
error: Failed to destroy domain 'router'
error: Unable to read from '/sys/fs/cgroup/machine.slice/machine-
lxc\x2d1505\x2drouter.scope/system.slice/system-openvpn.slice/
cgroup.controllers': No such file or directory

root@down ~ › dpkg -l | grep libvirt         
ii  libvirt-clients                         7.0.0-4~1.gbp7decb2             
amd64        Programs for the libvirt library
ii  libvirt-daemon                          7.0.0-4~1.gbp7decb2             
amd64        Virtualization daemon
ii  libvirt-daemon-config-network           7.0.0-4~1.gbp7decb2             all 
         
Libvirt daemon configuration files (default network)
ii  libvirt-daemon-config-nwfilter          7.0.0-4~1.gbp7decb2             all 
         
Libvirt daemon configuration files (default network filters)
ii  libvirt-daemon-driver-lxc               7.0.0-4~1.gbp7decb2             
amd64        Virtualization daemon LXC connection driver
ii  libvirt-daemon-driver-qemu              7.0.0-4~1.gbp7decb2             
amd64        Virtualization daemon QEMU connection driver
ii  libvirt-daemon-system                   7.0.0-4~1.gbp7decb2             
amd64        Libvirt daemon configuration files
ii  libvirt-daemon-system-systemd           7.0.0-4~1.gbp7decb2             
all          Libvirt daemon configuration files (systemd)
ii  libvirt0:amd64                          7.0.0-4~1.gbp7decb2             
amd64        library for interfacing with different virtualization systems
ii  python3-libvirt                         7.0.0-2                         
amd64        libvirt Python 3 bindings

I took a brief look at the code, but at first glance I have no idea how to 
gracefully deal with this race. Unfortunately, a google search did not bring 
up anything useful either. Sometimes it takes many attempts (10+) to reproduce 
this :/.

kind regards,
Jonas

> 
> > Cheers,
> >  -- Guido
> > 
> > > Message-Id: 
<ea7d0ca37cce76e1327945c4864b996d7fd6d2e6.1618903455.git.mpriv...@redhat.com>
> > > From: Michal Privoznik <mpriv...@redhat.com>
> > > Date: Fri, 16 Apr 2021 16:39:14 +0200
> > > Subject: [PATCH] vircgroup: Fix virCgroupKillRecursive() wrt nested
> > >  controllers
> > > MIME-Version: 1.0
> > > Content-Type: text/plain; charset=UTF-8
> > > Content-Transfer-Encoding: 8bit
> > > 
> > > I've encountered the following bug, but only on Gentoo with
> > > systemd and CGroupsV2. I've started an LXC container successfully
> > > but destroying it reported the following error:
> > > 
> > >   error: Failed to destroy domain 'amd64'
> > >   error: internal error: failed to get cgroup backend for 
'pathOfController'
> > > 
> > > Debugging showed, that CGroup hierarchy is full of surprises:
> > > 
> > > /sys/fs/cgroup/machine.slice/machine-lxc\x2d861\x2damd64.scope/
> > > └── libvirt
> > >     ├── dev-hugepages.mount
> > >     ├── dev-mqueue.mount
> > >     ├── init.scope
> > >     ├── sys-fs-fuse-connections.mount
> > >     ├── sys-kernel-config.mount
> > >     ├── sys-kernel-debug.mount
> > >     ├── sys-kernel-tracing.mount
> > >     ├── system.slice
> > >     │   ├── console-getty.service
> > >     │   ├── dbus.service
> > >     │   ├── system-getty.slice
> > >     │   ├── system-modprobe.slice
> > >     │   ├── systemd-journald.service
> > >     │   ├── systemd-logind.service
> > >     │   └── tmp.mount
> > >     └── user.slice
> > > 
> > > For comparison, here's the same container on recent Rawhide:

Reply via email to