Public bug reported:

Cloud-init's hotplug / policy-base routing feature for multi-nic
instances on EC2 is broken, due to the following issue, I believe:

Steps to reproduce:

1. Launch an Oracular vm on EC2.
2. Add a NIC with an IPv4 floating ip associated to it.

The issue:

The cloud-init's hotplug functionality generates and applies the
following netplan configuration in /etc/netplan/50-cloud-init.yaml:

```
network:
  version: 2
  ethernets:
 ens5:
   match:
     macaddress: "06:1c:fb:9b:d3:e5"
   dhcp4: true
   dhcp4-overrides:
     route-metric: 100
   dhcp6: true
   dhcp6-overrides:
     route-metric: 100
   set-name: "ens5"
 ens6:
   match:
     macaddress: "06:cb:49:de:56:51"
   dhcp4: true
   dhcp4-overrides:
     use-routes: true
     route-metric: 200
   dhcp6: false
   set-name: "ens6"
   routes:
   - table: 101
     to: "0.0.0.0/0"
     via: "192.168.0.1"
   - scope: "link"
     table: 101
     to: "192.168.0.0/20"
   routing-policy:
   - table: 101
     from: "192.168.12.94"
```

But, the routes added are:

```
$ ip route
default via 192.168.0.1 dev ens5 proto dhcp src 192.168.0.64 metric 1002 mtu 
9001
default via 192.168.0.1 dev ens6 proto dhcp src 192.168.12.94 metric 1003 mtu 
9001
192.168.0.0/20 dev ens5 proto dhcp scope link src 192.168.0.64 metric 1002 mtu 
9001
192.168.0.0/20 dev ens6 proto dhcp scope link src 192.168.12.94 metric 1003 mtu 
9001
192.168.0.1 dev ens5 proto dhcp scope link src 192.168.0.64 metric 100
192.168.0.2 dev ens5 proto dhcp scope link src 192.168.0.64 metric 100

$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group 
default qlen 1000
 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
 inet 127.0.0.1/8 scope host lo
    valid_lft forever preferred_lft forever
 inet6 ::1/128 scope host noprefixroute
    valid_lft forever preferred_lft forever
2: ens5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP group 
default qlen 1000
 link/ether 06:1c:fb:9b:d3:e5 brd ff:ff:ff:ff:ff:ff
 altname enp0s5
 inet 192.168.0.64/20 brd 192.168.15.255 scope global dynamic ens5
    valid_lft 3513sec preferred_lft 3063sec
 inet6 2a05:d011:311:a00:c57a:7c3e:723a:8efc/128 scope global dynamic 
noprefixroute
    valid_lft 415sec preferred_lft 105sec
 inet6 fe80::41c:fbff:fe9b:d3e5/64 scope link proto kernel_ll
    valid_lft forever preferred_lft forever
3: ens6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group 
default qlen 1000
 link/ether 06:cb:49:de:56:51 brd ff:ff:ff:ff:ff:ff
 altname enp0s6
 inet 192.168.12.94/20 brd 192.168.15.255 scope global dynamic noprefixroute 
ens6
    valid_lft 3514sec preferred_lft 3064sec
 inet6 fe80::4cb:49ff:fede:5651/64 scope link proto kernel_ll
    valid_lft forever preferred_lft forever
```

We can see that the route table lacks

192.168.0.1 dev ens6 proto dhcp scope link src 192.168.12.94 metric 200
192.168.0.2 dev ens6 proto dhcp scope link src 192.168.12.94 metric 200

and that the metrics are not aligned.

With this network configuration, the instance is not reachable from
outside using the floating ip associated with ens6.

If we do the same in Ubuntu Noble, we get the following routing table
from an equivalent netplan config and the instance is reachable via
ens6:

```
default via 192.168.0.1 dev ens5 proto dhcp src 192.168.9.45 metric 100
default via 192.168.0.1 dev ens6 proto dhcp src 192.168.12.179 metric 200
192.168.0.0/20 dev ens5 proto kernel scope link src 192.168.9.45 metric 100
192.168.0.0/20 dev ens6 proto kernel scope link src 192.168.12.179 metric 200
192.168.0.1 dev ens5 proto dhcp scope link src 192.168.9.45 metric 100
192.168.0.1 dev ens6 proto dhcp scope link src 192.168.12.179 metric 200
192.168.0.2 dev ens5 proto dhcp scope link src 192.168.9.45 metric 100
192.168.0.2 dev ens6 proto dhcp scope link src 192.168.12.179 metric 200
```

I have tried downgrading cloud-init to 24.4~2g2e4c39b7-0ubuntu1 in
oracular, and I get the same behavior consistently. This discards the
issue being introduced in cloud-init 24.3 or 23.3.1.

This is covered by cloud-init CI and the last passing execution was at
Sep 1, 2024.

** Affects: netplan.io (Ubuntu)
     Importance: Undecided
         Status: New

** Affects: netplan.io (Ubuntu Oracular)
     Importance: Undecided
         Status: New

** Attachment added: "cloud-init.tar.gz"
   
https://bugs.launchpad.net/bugs/2083695/+attachment/5824925/+files/cloud-init.tar.gz

** Also affects: netplan.io (Ubuntu Oracular)
   Importance: Undecided
       Status: New

** Description changed:

  Cloud-init's hotplug / policy-base routing feature for multi-nic
  instances on EC2 is broken, due to the following issue, I believe:
  
  Steps to reproduce:
  
  1. Launch an Oracular vm on EC2.
  2. Add a NIC with an IPv4 floating ip associated to it.
  
  The issue:
  
  The cloud-init's hotplug functionality generates and applies the
  following netplan configuration in /etc/netplan/50-cloud-init.yaml:
  
  ```
  network:
-   version: 2
-   ethernets:
-       ens5:
-       match:
-       macaddress: "06:1c:fb:9b:d3:e5"
-       dhcp4: true
-       dhcp4-overrides:
-       route-metric: 100
-       dhcp6: true
-       dhcp6-overrides:
-       route-metric: 100
-       set-name: "ens5"
-       ens6:
-       match:
-       macaddress: "06:cb:49:de:56:51"
-       dhcp4: true
-       dhcp4-overrides:
-       use-routes: true
-       route-metric: 200
-       dhcp6: false
-       set-name: "ens6"
-       routes:
-       - table: 101
-       to: "0.0.0.0/0"
-       via: "192.168.0.1"
-       - scope: "link"
-       table: 101
-       to: "192.168.0.0/20"
-       routing-policy:
-       - table: 101
-       from: "192.168.12.94"
+   version: 2
+   ethernets:
+  ens5:
+    match:
+      macaddress: "06:1c:fb:9b:d3:e5"
+    dhcp4: true
+    dhcp4-overrides:
+      route-metric: 100
+    dhcp6: true
+    dhcp6-overrides:
+      route-metric: 100
+    set-name: "ens5"
+  ens6:
+    match:
+      macaddress: "06:cb:49:de:56:51"
+    dhcp4: true
+    dhcp4-overrides:
+      use-routes: true
+      route-metric: 200
+    dhcp6: false
+    set-name: "ens6"
+    routes:
+    - table: 101
+      to: "0.0.0.0/0"
+      via: "192.168.0.1"
+    - scope: "link"
+      table: 101
+      to: "192.168.0.0/20"
+    routing-policy:
+    - table: 101
+      from: "192.168.12.94"
  ```
  
  But, the routes added are:
  
  ```
  $ ip route
  default via 192.168.0.1 dev ens5 proto dhcp src 192.168.0.64 metric 1002 mtu 
9001
  default via 192.168.0.1 dev ens6 proto dhcp src 192.168.12.94 metric 1003 mtu 
9001
  192.168.0.0/20 dev ens5 proto dhcp scope link src 192.168.0.64 metric 1002 
mtu 9001
  192.168.0.0/20 dev ens6 proto dhcp scope link src 192.168.12.94 metric 1003 
mtu 9001
  192.168.0.1 dev ens5 proto dhcp scope link src 192.168.0.64 metric 100
  192.168.0.2 dev ens5 proto dhcp scope link src 192.168.0.64 metric 100
  
  $ ip a
  1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group 
default qlen 1000
-       link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
-       inet 127.0.0.1/8 scope host lo
-       valid_lft forever preferred_lft forever
-       inet6 ::1/128 scope host noprefixroute
-       valid_lft forever preferred_lft forever
+  link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
+  inet 127.0.0.1/8 scope host lo
+     valid_lft forever preferred_lft forever
+  inet6 ::1/128 scope host noprefixroute
+     valid_lft forever preferred_lft forever
  2: ens5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP group 
default qlen 1000
-       link/ether 06:1c:fb:9b:d3:e5 brd ff:ff:ff:ff:ff:ff
-       altname enp0s5
-       inet 192.168.0.64/20 brd 192.168.15.255 scope global dynamic ens5
-       valid_lft 3513sec preferred_lft 3063sec
-       inet6 2a05:d011:311:a00:c57a:7c3e:723a:8efc/128 scope global dynamic 
noprefixroute
-       valid_lft 415sec preferred_lft 105sec
-       inet6 fe80::41c:fbff:fe9b:d3e5/64 scope link proto kernel_ll
-       valid_lft forever preferred_lft forever
+  link/ether 06:1c:fb:9b:d3:e5 brd ff:ff:ff:ff:ff:ff
+  altname enp0s5
+  inet 192.168.0.64/20 brd 192.168.15.255 scope global dynamic ens5
+     valid_lft 3513sec preferred_lft 3063sec
+  inet6 2a05:d011:311:a00:c57a:7c3e:723a:8efc/128 scope global dynamic 
noprefixroute
+     valid_lft 415sec preferred_lft 105sec
+  inet6 fe80::41c:fbff:fe9b:d3e5/64 scope link proto kernel_ll
+     valid_lft forever preferred_lft forever
  3: ens6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group 
default qlen 1000
-       link/ether 06:cb:49:de:56:51 brd ff:ff:ff:ff:ff:ff
-       altname enp0s6
-       inet 192.168.12.94/20 brd 192.168.15.255 scope global dynamic 
noprefixroute ens6
-       valid_lft 3514sec preferred_lft 3064sec
-       inet6 fe80::4cb:49ff:fede:5651/64 scope link proto kernel_ll
-       valid_lft forever preferred_lft forever
+  link/ether 06:cb:49:de:56:51 brd ff:ff:ff:ff:ff:ff
+  altname enp0s6
+  inet 192.168.12.94/20 brd 192.168.15.255 scope global dynamic noprefixroute 
ens6
+     valid_lft 3514sec preferred_lft 3064sec
+  inet6 fe80::4cb:49ff:fede:5651/64 scope link proto kernel_ll
+     valid_lft forever preferred_lft forever
  ```
  
  We can see that the route table lacks
  
  192.168.0.1 dev ens6 proto dhcp scope link src 192.168.12.94 metric 200
  192.168.0.2 dev ens6 proto dhcp scope link src 192.168.12.94 metric 200
  
  and that the metrics are not aligned.
  
  With this network configuration, the instance is not reachable from
  outside using the floating ip associated with ens6.
  
  If we do the same in Ubuntu Noble, we get the following routing table
  from an equivalent netplan config and the instance is reachable via
  ens6:
  
  ```
  default via 192.168.0.1 dev ens5 proto dhcp src 192.168.9.45 metric 100
  default via 192.168.0.1 dev ens6 proto dhcp src 192.168.12.179 metric 200
  192.168.0.0/20 dev ens5 proto kernel scope link src 192.168.9.45 metric 100
  192.168.0.0/20 dev ens6 proto kernel scope link src 192.168.12.179 metric 200
  192.168.0.1 dev ens5 proto dhcp scope link src 192.168.9.45 metric 100
  192.168.0.1 dev ens6 proto dhcp scope link src 192.168.12.179 metric 200
  192.168.0.2 dev ens5 proto dhcp scope link src 192.168.9.45 metric 100
  192.168.0.2 dev ens6 proto dhcp scope link src 192.168.12.179 metric 200
  ```
  
  I have tried downgrading cloud-init to 24.4~2g2e4c39b7-0ubuntu1 in
- oracular, and I get the same behavior consistently.
+ oracular, and I get the same behavior consistently. This discards the
+ issue being introduced in cloud-init 24.3 or 23.3.1.
  
  This is covered by cloud-init CI and the last passing execution was at
  Sep 1, 2024.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2083695

Title:
  netplan does not fully generate routes for PBR

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/netplan.io/+bug/2083695/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to