Package: libvirt-daemon-system Version: 7.0.0-3 Severity: important Hi,
Systemd has a class of boot-time races which can result in deadlock, which I learned more than I ever wanted to know about when Buster to Bullseye upgrades started leaving me with machines that were off the network when they were rebooted ... The reason for that is a bit of a tangle of otherwise unrelated packages, and there are many ways this *could* happen, but the root of it in my particular case was the libvirt package switching to use socket activation instead of letting the daemon create its own socket when it is ready to respond to requests on it. The race occurs because the .socket unit creates the libvirt control socket very early in the boot, before even the network-pre target is reached, and so long before the libvirtd.service dependencies are satisfied and the daemon itself can be started to handle requests. The deadlock in my case occurs when a udev rule for a device already attached at boot tries to assign that device to a VM. Prior to Bullseye, what would occur is: The udev rule calls a small script on device hot/cold plug which checks a config file, and if the device is allocated to a VM, then calls virsh to attach it to that VM. This 'immediately' either succeeds, fails because the desired VM is not actually running (yet), or fails because libvirtd is not running and virsh did not find its socket present. If either of the failure cases occur, the calling script fails gracefully, and a QEMU hook will later handle attaching the device if/when libvirtd and the desired VM is actually started. But in Bullseye there's a three-way race, and if the zombie socket is created before the udev rule runs, then virsh connects to it, but hangs indefinitely waiting for libvirtd.service to be able to start and respond to the request. The deadlock in this specific case then happens when ifupdown-pre (but it could be any of many other things) calls udevadm settle to give the initial network devices a chance to be fully set up and available before the networking.service brings them up. Which in turn then hangs waiting for the (otherwise unrelated) udev rule above to complete, which won't happen until libvirtd is started, which won't happen until the udev rule returns (or udevadm settle times out) and network.target (among others) can be reached. Everything stops for two minutes until the systemd "bug solver" of arbitrary timeouts starts killing things, and the machine finishes booting without any network devices. The latter can be avoided (in most cases at least) with a tweak to the networking.service dependencies (the bug I've reported here https://bugs.debian.org/998088 has more of the gory details of this problem from the perspective of ifupdown's entanglement in it). But we can avoid this specific incarnation of it completely if the libvirtd.socket unit declared the same ordering dependencies as the libvirtd.service does, so that anything calling virsh, at any time, can reasonably expect an answer in finite time instead of blocking indefinitely to wait for a service (that systemd already knows does not even have the basic preconditions to make it eligible to start yet but ignores that to create the socket anyway). Unless systemd gets smarter about this, there may always be a race with the possibility of circular deadlocks if creation of the socket and responding to requests for it are not atomic with the creation of the service using it - so it may actually be better to just go back to letting the daemon create and manage the socket itself (as its "activation" signal to users of that socket) - but we can at least narrow the window for losing it significantly if we defer creation of the socket until at least the point where systemd thinks it can attempt to start the daemon (though with no guarantee of success at that still ...) I hope I haven't missed anything that makes this make sense in the context of libvirt ... trying to look at and describe this from four entirely independent points of view, each that doesn't directly care about any of the others, is a bit of a hall of mirrors with small parts of the problem stuck to each of them! Cheers, Ron -- System Information: Debian Release: 11.1 APT prefers stable-updates APT policy: (500, 'stable-updates'), (500, 'stable-security'), (500, 'stable') Architecture: amd64 (x86_64) Kernel: Linux 5.10.0-9-amd64 (SMP w/12 CPU threads) Locale: LANG=en_AU.utf8, LC_CTYPE=en_AU.utf8 (charmap=UTF-8), LANGUAGE=en_AU:en Shell: /bin/sh linked to /bin/dash Init: systemd (via /run/systemd/system) Versions of packages libvirt-daemon-system depends on: ii adduser 3.118 ii debconf [debconf-2.0] 1.5.77 ii gettext-base 0.21-4 ii iptables 1.8.7-1 ii libvirt-clients 7.0.0-3 ii libvirt-daemon 7.0.0-3 ii libvirt-daemon-config-network 7.0.0-3 ii libvirt-daemon-config-nwfilter 7.0.0-3 ii libvirt-daemon-system-systemd 7.0.0-3 ii logrotate 3.18.0-2 ii policykit-1 0.105-31

