The dbus race that is happening here is due to `networkctl reconfigure`[1] being run by netplan apply, failing to talk to dbus, and restarting systemd_networkd[2] at that point in time when systemd_network may actually be coming up and is in an indeterminate state.
[1] https://github.com/canonical/netplan/blob/main/netplan/cli/utils.py#L116 [2] https://github.com/canonical/netplan/blob/main/netplan/cli/commands/apply.py#L277 I'm guessing the restart here from netplan apply is what's triggering the occasional failure case where not all network config is applied (like IP addresses) in systemd-networkd. It doesn't happen all the time but it's racy as systemd-networkd is mid startup and we're restarting it again via netplan apply. After discussion with waldi (Bastian Blank) in Debian land about the systemd dependency chain, it seems my suggestion about about adding dbus.socket to cloud-init.service will actually introduce an ordering cycle because dbus.socket is After=sysinit.target, yet cloud-init.service is Before=sysinit.target. So, trying to shoehorn cloud-init into the dependency chain After=dbus.socket is impossible for systemd to schedule. Maybe, we'd want one of the following instead: 1. `netplan apply` provide an option to avoid falling back to `networkctl reconfigure` and exit non-zero so cloud-init can do something better, or retry where necessary 2. `netplan apply` can defer or block/retry until dbus.socket/service is ready allowing this only to affect cases where netplan apply is called 3. cloud-init to defer calling netplan apply on systemd-networkd environments until later boot stage (cloud-config.service) which comes after sysinit.target (and therefore can expect dbus.socket to be started at that point in boot. I'll add netplan here to see if there are thoughts or counter suggestions here. ** Also affects: netplan Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to cloud-init. https://bugs.launchpad.net/bugs/1997124 Title: Netplan/Systemd/Cloud-init/Dbus Race Status in cloud-init: In Progress Status in netplan: New Status in systemd package in Ubuntu: Confirmed Bug description: Cloud-init is seeing intermittent failures while running `netplan apply`, which appears to be caused by a missing resource at the time of call. The symptom in cloud-init logs looks like: Running ['netplan', 'apply'] resulted in stderr output: Failed to connect system bus: No such file or directory I think that this error[1] is likely caused by cloud-init running netplan apply too early in boot process (before dbus is active). Today I stumbled upon this error which was hit in MAAS[2]. We have also hit it intermittently during tests (we didn't have a reproducer). Realizing that this may not be a cloud-init error, but possibly a dependency bug between dbus/systemd we decided to file this bug for broader visibility to other projects. I will follow up this initial report with some comments from our discussion earlier. [1] https://github.com/canonical/netplan/blob/main/src/dbus.c#L801 [2] https://discourse.maas.io/t/latest-ubuntu-20-04-image-causing-netplan-error/5970 To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1997124/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp