Public bug reported:

I have a VPS upgraded to Ubuntu 22.04 with a rather peculiar network
setup.  My /etc/netplan/90-vz-ens3.yaml is

    network:
      ethernets:
        ens3:
          addresses:
          - 80.209.x.y/32
          - 10.209.x.y/8
          - 2a02:7b40:x:y::1/128
          dhcp4: false
          dhcp6: false
          routes:
          - scope: link
            to: 169.254.0.1
            via: 0.0.0.0
          - to: default
            via: 169.254.0.1
          - to: default
            via: fe80::ffff:1:1
      version: 2

After a reboot there's a 50% chance that the VPS will fail to bring up the 
default route and will be unreachable over the network, making me reach for the 
emergency VNC console provided by the VPS provider.
When this happens, journalctl will say

    Nov 10 04:31:56 egle.gedmin.as systemd-networkd[652]: ens3: Re-configuring 
with /run/systemd/network/10-netplan-ens3.network
    Nov 10 04:31:56 egle.gedmin.as systemd-networkd[652]: ens3: DHCPv6 lease 
lost
    Nov 10 04:31:56 egle.gedmin.as systemd-networkd[652]: ens3: Could not set 
route: Nexthop has invalid gateway. Network is unreachable
    Nov 10 04:31:56 egle.gedmin.as systemd-networkd[652]: ens3: Failed

and the routing table looks like this:

    10.0.0.0/8 dev ens3 proto kernel scope link src 10.209.225.198 
    169.254.0.1 dev ens3 proto static scope link 

When I VNC in, I can run `sudo netplan apply` and _usually_ the network
gets fixed.  More specifically, the default route gets added and ip r
reports

    default via 169.254.0.1 dev ens3 
    10.0.0.0/8 dev ens3 proto kernel scope link src 10.209.225.198 
    169.254.0.1 dev ens3 proto static scope link 

Today this happened again and I checked the netplan-generated
/run/systemd/network/10-netplan-ens3.network.  It looked correct to me:

    [Match]
    Name=ens3

    [Network]
    LinkLocalAddressing=ipv6
    Address=80.209.x.y/32
    Address=10.209.x.y/8
    Address=2a02:7b40:x:y::1/128

    [Route]
    Destination=169.254.0.1
    Gateway=0.0.0.0
    Scope=link

    [Route]
    Destination=0.0.0.0/0
    Gateway=169.254.0.1

    [Route]
    Destination=::/0
    Gateway=fe80::ffff:1:1

so this looks like an issue with systemd-networkd.

This bug is very similar to bug 2073869, except that _sometimes_ things
work and the VPS can reboot without losing network.  It might be a race
condition in systemd-networkd, applying the routes not in the right
order (AFAIU you need the 169.254.0.1 route to exist for the default
route to be possible?).

It looks rather similar to upstream bug
https://github.com/systemd/systemd/issues/28358 except there's no
suspend involved.


Background information, probably irrelevant to the bug:

The VPS is a KVM VM.  Its network configuration is confusing; there's
cloud-init that does something (and prints the routes to the journal on
boot), and then I see

    Nov 22 04:31:48 egle.gedmin.as qemu-ga[842]: info: guest-exec
called: "prl_nettool set --hostname egle.gedmin.as --ip
00:00:50:xx:xx:xx 80.209.x.y/255.255.255.255 10.209.x.y/255.0.0.0
2a02:7b40:x:y::1/128 --gateway 00:00:50:xx:xx:xx 169.254.0.1
fe80::ffff:1:1  --dns 00:00:50:xx:xx:xx 79.98.25.143 79.98.29.143"

which runs /usr/sbin/prl_nettool (part of the prl-nettoolp package,
installed locally, not from the Ubuntu archive), which runs a bunch of
shell in /usr/lib/parallels-tools/tools/scripts/, which in turn execute
a Python script /usr/lib/parallels-tools/tools/scripts/netplan-cfg.py
that overwrites /etc/netplan/90-vz-ens3.yaml on every boot and then
calls netplan apply.

This makes it a bit more complicated (but not impossible) to apply
workarounds like `on-link: true` suggested in bug 2073869.

ProblemType: Bug
DistroRelease: Ubuntu 22.04
Package: systemd 249.11-0ubuntu3.12
ProcVersionSignature: Ubuntu 5.15.0-126.136-generic 5.15.167
Uname: Linux 5.15.0-126-generic x86_64
ApportVersion: 2.20.11-0ubuntu82.6
Architecture: amd64
CasperMD5CheckResult: unknown
CloudArchitecture: x86_64
CloudID: nocloud
CloudName: unknown
CloudPlatform: nocloud
CloudSubPlatform: seed-dir (/var/lib/cloud/seed/nocloud-net)
Date: Fri Nov 22 08:34:52 2024
Lsusb: Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Lsusb-t: /:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=uhci_hcd/2p, 12M
MachineType: Virtuozzo KVM
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.15.0-126-generic 
root=UUID=38220ddd-8acc-476c-a7a6-7eec445d7a28 ro maybe-ubiquity
SourcePackage: systemd
UpgradeStatus: Upgraded to jammy on 2024-09-11 (71 days ago)
dmi.bios.date: 04/01/2014
dmi.bios.release: 0.0
dmi.bios.vendor: SeaBIOS
dmi.bios.version: 1.11.0-2.vz7.2
dmi.chassis.type: 1
dmi.chassis.vendor: Virtuozzo
dmi.chassis.version: Virtuozzo 7.6.0 PC (i440FX + PIIX, 1996)
dmi.modalias: 
dmi:bvnSeaBIOS:bvr1.11.0-2.vz7.2:bd04/01/2014:br0.0:svnVirtuozzo:pnKVM:pvrVirtuozzo7.6.0PC(i440FX+PIIX,1996):cvnVirtuozzo:ct1:cvrVirtuozzo7.6.0PC(i440FX+PIIX,1996):sku:
dmi.product.family: Virtuozzo Linux
dmi.product.name: KVM
dmi.product.version: Virtuozzo 7.6.0 PC (i440FX + PIIX, 1996)
dmi.sys.vendor: Virtuozzo

** Affects: systemd (Ubuntu)
     Importance: Undecided
         Status: New


** Tags: amd64 apport-bug jammy uec-images

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/2089342

Title:
  systemd-networkd randomly fails to set route because "Nexthop has
  invalid gateway"

Status in systemd package in Ubuntu:
  New

Bug description:
  I have a VPS upgraded to Ubuntu 22.04 with a rather peculiar network
  setup.  My /etc/netplan/90-vz-ens3.yaml is

      network:
        ethernets:
          ens3:
            addresses:
            - 80.209.x.y/32
            - 10.209.x.y/8
            - 2a02:7b40:x:y::1/128
            dhcp4: false
            dhcp6: false
            routes:
            - scope: link
              to: 169.254.0.1
              via: 0.0.0.0
            - to: default
              via: 169.254.0.1
            - to: default
              via: fe80::ffff:1:1
        version: 2

  After a reboot there's a 50% chance that the VPS will fail to bring up the 
default route and will be unreachable over the network, making me reach for the 
emergency VNC console provided by the VPS provider.
  When this happens, journalctl will say

      Nov 10 04:31:56 egle.gedmin.as systemd-networkd[652]: ens3: 
Re-configuring with /run/systemd/network/10-netplan-ens3.network
      Nov 10 04:31:56 egle.gedmin.as systemd-networkd[652]: ens3: DHCPv6 lease 
lost
      Nov 10 04:31:56 egle.gedmin.as systemd-networkd[652]: ens3: Could not set 
route: Nexthop has invalid gateway. Network is unreachable
      Nov 10 04:31:56 egle.gedmin.as systemd-networkd[652]: ens3: Failed

  and the routing table looks like this:

      10.0.0.0/8 dev ens3 proto kernel scope link src 10.209.225.198 
      169.254.0.1 dev ens3 proto static scope link 

  When I VNC in, I can run `sudo netplan apply` and _usually_ the
  network gets fixed.  More specifically, the default route gets added
  and ip r reports

      default via 169.254.0.1 dev ens3 
      10.0.0.0/8 dev ens3 proto kernel scope link src 10.209.225.198 
      169.254.0.1 dev ens3 proto static scope link 

  Today this happened again and I checked the netplan-generated
  /run/systemd/network/10-netplan-ens3.network.  It looked correct to
  me:

      [Match]
      Name=ens3

      [Network]
      LinkLocalAddressing=ipv6
      Address=80.209.x.y/32
      Address=10.209.x.y/8
      Address=2a02:7b40:x:y::1/128

      [Route]
      Destination=169.254.0.1
      Gateway=0.0.0.0
      Scope=link

      [Route]
      Destination=0.0.0.0/0
      Gateway=169.254.0.1

      [Route]
      Destination=::/0
      Gateway=fe80::ffff:1:1

  so this looks like an issue with systemd-networkd.

  This bug is very similar to bug 2073869, except that _sometimes_
  things work and the VPS can reboot without losing network.  It might
  be a race condition in systemd-networkd, applying the routes not in
  the right order (AFAIU you need the 169.254.0.1 route to exist for the
  default route to be possible?).

  It looks rather similar to upstream bug
  https://github.com/systemd/systemd/issues/28358 except there's no
  suspend involved.

  
  Background information, probably irrelevant to the bug:

  The VPS is a KVM VM.  Its network configuration is confusing; there's
  cloud-init that does something (and prints the routes to the journal
  on boot), and then I see

      Nov 22 04:31:48 egle.gedmin.as qemu-ga[842]: info: guest-exec
  called: "prl_nettool set --hostname egle.gedmin.as --ip
  00:00:50:xx:xx:xx 80.209.x.y/255.255.255.255 10.209.x.y/255.0.0.0
  2a02:7b40:x:y::1/128 --gateway 00:00:50:xx:xx:xx 169.254.0.1
  fe80::ffff:1:1  --dns 00:00:50:xx:xx:xx 79.98.25.143 79.98.29.143"

  which runs /usr/sbin/prl_nettool (part of the prl-nettoolp package,
  installed locally, not from the Ubuntu archive), which runs a bunch of
  shell in /usr/lib/parallels-tools/tools/scripts/, which in turn
  execute a Python script /usr/lib/parallels-
  tools/tools/scripts/netplan-cfg.py that overwrites /etc/netplan/90-vz-
  ens3.yaml on every boot and then calls netplan apply.

  This makes it a bit more complicated (but not impossible) to apply
  workarounds like `on-link: true` suggested in bug 2073869.

  ProblemType: Bug
  DistroRelease: Ubuntu 22.04
  Package: systemd 249.11-0ubuntu3.12
  ProcVersionSignature: Ubuntu 5.15.0-126.136-generic 5.15.167
  Uname: Linux 5.15.0-126-generic x86_64
  ApportVersion: 2.20.11-0ubuntu82.6
  Architecture: amd64
  CasperMD5CheckResult: unknown
  CloudArchitecture: x86_64
  CloudID: nocloud
  CloudName: unknown
  CloudPlatform: nocloud
  CloudSubPlatform: seed-dir (/var/lib/cloud/seed/nocloud-net)
  Date: Fri Nov 22 08:34:52 2024
  Lsusb: Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
  Lsusb-t: /:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=uhci_hcd/2p, 12M
  MachineType: Virtuozzo KVM
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.15.0-126-generic 
root=UUID=38220ddd-8acc-476c-a7a6-7eec445d7a28 ro maybe-ubiquity
  SourcePackage: systemd
  UpgradeStatus: Upgraded to jammy on 2024-09-11 (71 days ago)
  dmi.bios.date: 04/01/2014
  dmi.bios.release: 0.0
  dmi.bios.vendor: SeaBIOS
  dmi.bios.version: 1.11.0-2.vz7.2
  dmi.chassis.type: 1
  dmi.chassis.vendor: Virtuozzo
  dmi.chassis.version: Virtuozzo 7.6.0 PC (i440FX + PIIX, 1996)
  dmi.modalias: 
dmi:bvnSeaBIOS:bvr1.11.0-2.vz7.2:bd04/01/2014:br0.0:svnVirtuozzo:pnKVM:pvrVirtuozzo7.6.0PC(i440FX+PIIX,1996):cvnVirtuozzo:ct1:cvrVirtuozzo7.6.0PC(i440FX+PIIX,1996):sku:
  dmi.product.family: Virtuozzo Linux
  dmi.product.name: KVM
  dmi.product.version: Virtuozzo 7.6.0 PC (i440FX + PIIX, 1996)
  dmi.sys.vendor: Virtuozzo

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/2089342/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to     : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to