(k)rafaeldtinoco@kcluster01:~$ dpkg -l | grep "ii systemd " ii systemd 243-3ubuntu1 amd64 system and service manager
k)rafaeldtinoco@kcluster01:~$ for name in kcluster01 kcluster02 kcluster03; do ssh $name "dpkg -l | grep systemd "; done | grep "ii systemd " ii systemd 243-3ubuntu1 amd64 system and service manager ii systemd 243-3ubuntu1 amd64 system and service manager ii systemd 243-3ubuntu1 amd64 system and service manager ---- (k)rafaeldtinoco@kcluster01:~$ for name in kcluster01 kcluster02 kcluster03; do ssh $name "cat /etc/systemd/network/10-netplan-eth3.network"; done [Match] Name=eth3 [Network] LinkLocalAddressing=ipv6 Address=10.0.3.2/24 KeepConfiguration=static [Match] Name=eth3 [Network] LinkLocalAddressing=ipv6 Address=10.0.3.3/24 KeepConfiguration=static [Match] Name=eth3 [Network] LinkLocalAddressing=ipv6 Address=10.0.3.4/24 KeepConfiguration=static ---- (k)rafaeldtinoco@kcluster01:~$ crm status Stack: corosync Current DC: kcluster01 (version 2.0.1-9e909a5bdd) - partition with quorum Last updated: Tue Nov 19 16:38:15 2019 Last change: Mon Nov 18 12:41:14 2019 by root via crm_resource on kcluster01 3 nodes configured 5 resources configured Online: [ kcluster01 kcluster02 kcluster03 ] Full list of resources: fence_kcluster01 (stonith:fence_virsh): Started kcluster02 fence_kcluster02 (stonith:fence_virsh): Started kcluster01 fence_kcluster03 (stonith:fence_virsh): Started kcluster01 Resource Group: webserver_virtual_ip webserver (systemd:lighttpd): Started kcluster01 virtual_ip (ocf::heartbeat:IPaddr2): Started kcluster01 ---- (k)rafaeldtinoco@kcluster01:~$ for name in kcluster01 kcluster02 kcluster03; do ssh $name "hostname ; ip addr show eth3"; done kcluster01 5: eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether 52:54:00:11:a0:03 brd ff:ff:ff:ff:ff:ff inet 10.0.3.2/24 brd 10.0.3.255 scope global eth3 valid_lft forever preferred_lft forever inet 10.0.3.1/24 brd 10.0.3.255 scope global secondary eth3 valid_lft forever preferred_lft forever inet6 fe80::5054:ff:fe11:a003/64 scope link valid_lft forever preferred_lft forever kcluster02 5: eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether 52:54:00:1d:1a:cc brd ff:ff:ff:ff:ff:ff inet 10.0.3.3/24 brd 10.0.3.255 scope global eth3 valid_lft forever preferred_lft forever inet6 fe80::5054:ff:fe1d:1acc/64 scope link valid_lft forever preferred_lft forever kcluster03 5: eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether 52:54:00:b0:13:16 brd ff:ff:ff:ff:ff:ff inet 10.0.3.4/24 brd 10.0.3.255 scope global eth3 valid_lft forever preferred_lft forever inet6 fe80::5054:ff:feb0:1316/64 scope link valid_lft forever preferred_lft forever ---- in parallel: (k)rafaeldtinoco@kcluster01:~$ journalctl -f -u pacemaker and check if events are generated (vip monitor detects changes) ---- (k)rafaeldtinoco@kcluster01:~$ systemctl restart systemd-networkd ---- No VIP changes: (k)rafaeldtinoco@kcluster01:~$ for name in kcluster01 kcluster02 kcluster03; do ssh $name "hostname ; ip addr show eth3"; done kcluster01 5: eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether 52:54:00:11:a0:03 brd ff:ff:ff:ff:ff:ff inet 10.0.3.2/24 brd 10.0.3.255 scope global eth3 valid_lft forever preferred_lft forever inet 10.0.3.1/24 brd 10.0.3.255 scope global secondary eth3 valid_lft forever preferred_lft forever inet6 fe80::5054:ff:fe11:a003/64 scope link valid_lft forever preferred_lft forever kcluster02 5: eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether 52:54:00:1d:1a:cc brd ff:ff:ff:ff:ff:ff inet 10.0.3.3/24 brd 10.0.3.255 scope global eth3 valid_lft forever preferred_lft forever inet6 fe80::5054:ff:fe1d:1acc/64 scope link valid_lft forever preferred_lft forever kcluster03 5: eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether 52:54:00:b0:13:16 brd ff:ff:ff:ff:ff:ff inet 10.0.3.4/24 brd 10.0.3.255 scope global eth3 valid_lft forever preferred_lft forever inet6 fe80::5054:ff:feb0:1316/64 scope link valid_lft forever preferred_lft forever and no events generated! verification-done ** Tags removed: verification-needed verification-needed-eoan ** Tags added: verification-done verification-done-eoan -- You received this bug notification because you are a member of Ubuntu High Availability Team, which is subscribed to keepalived in Ubuntu. https://bugs.launchpad.net/bugs/1815101 Title: [master] Restarting systemd-networkd breaks keepalived, heartbeat, corosync, pacemaker (interface aliases are restarted) Status in Keepalived Charm: New Status in netplan: Confirmed Status in heartbeat package in Ubuntu: Triaged Status in keepalived package in Ubuntu: In Progress Status in systemd package in Ubuntu: In Progress Status in heartbeat source package in Bionic: Triaged Status in keepalived source package in Bionic: Confirmed Status in systemd source package in Bionic: Confirmed Status in heartbeat source package in Disco: Triaged Status in keepalived source package in Disco: Confirmed Status in systemd source package in Disco: Confirmed Status in heartbeat source package in Eoan: Triaged Status in keepalived source package in Eoan: In Progress Status in systemd source package in Eoan: Fix Committed Bug description: [impact] - ALL related HA software has a small problem if interfaces are being managed by systemd-networkd: nic restarts/reconfigs are always going to wipe all interfaces aliases when HA software is not expecting it to (no coordination between them. - keepalived, smb ctdb, pacemaker, all suffer from this. Pacemaker is smarter in this case because it has a service monitor that will restart the virtual IP resource, in affected node & nic, before considering a real failure, but other HA service might consider a real failure when it is not. [test case] - comment #14 is a full test case: to have 3 node pacemaker, in that example, and cause a networkd service restart: it will trigger a failure for the virtual IP resource monitor. - other example is given in the original description for keepalived. both suffer from the same issue (and other HA softwares as well). [regression potential] - this backports KeepConfiguration parameter, which adds some significant complexity to networkd's configuration and behavior, which could lead to regressions in correctly configuring the network at networkd start, or incorrectly maintaining configuration at networkd restart, or losing network state at networkd stop. - Any regressions are most likely to occur during networkd start, restart, or stop, and most likely to involve missing or incorrect ip address(es). - the change is based in upstream patches adding the exact feature we needed to fix this issue & it will be integrated with a netplan change to add the needed stanza to systemd nic configuration file (KeepConfiguration=) [other info] original description: --- Configure netplan for interfaces, for example (a working config with IP addresses obfuscated) network: ethernets: eth0: addresses: [192.168.0.5/24] dhcp4: false nameservers: search: [blah.com, other.blah.com, hq.blah.com, cust.blah.com, phone.blah.com] addresses: [10.22.11.1] eth2: addresses: - 12.13.14.18/29 - 12.13.14.19/29 gateway4: 12.13.14.17 dhcp4: false nameservers: search: [blah.com, other.blah.com, hq.blah.com, cust.blah.com, phone.blah.com] addresses: [10.22.11.1] eth3: addresses: [10.22.11.6/24] dhcp4: false nameservers: search: [blah.com, other.blah.com, hq.blah.com, cust.blah.com, phone.blah.com] addresses: [10.22.11.1] eth4: addresses: [10.22.14.6/24] dhcp4: false nameservers: search: [blah.com, other.blah.com, hq.blah.com, cust.blah.com, phone.blah.com] addresses: [10.22.11.1] eth7: addresses: [9.5.17.34/29] dhcp4: false optional: true nameservers: search: [blah.com, other.blah.com, hq.blah.com, cust.blah.com, phone.blah.com] addresses: [10.22.11.1] version: 2 Configure keepalived (again, a working config with IP addresses obfuscated) global_defs # Block id { notification_email { sysadm...@blah.com } notification_email_from keepali...@system3.hq.blah.com smtp_server 10.22.11.7 # IP smtp_connect_timeout 30 # integer, seconds router_id system3 # string identifying the machine, # (doesn't have to be hostname). vrrp_mcast_group4 224.0.0.18 # optional, default 224.0.0.18 vrrp_mcast_group6 ff02::12 # optional, default ff02::12 enable_traps # enable SNMP traps } vrrp_sync_group collection { group { wan lan phone } vrrp_instance wan { state MASTER interface eth2 virtual_router_id 77 priority 150 advert_int 1 smtp_alert authentication { auth_type PASS auth_pass BlahBlah } virtual_ipaddress { 12.13.14.20 } } vrrp_instance lan { state MASTER interface eth3 virtual_router_id 78 priority 150 advert_int 1 smtp_alert authentication { auth_type PASS auth_pass MoreBlah } virtual_ipaddress { 10.22.11.13/24 } } vrrp_instance phone { state MASTER interface eth4 virtual_router_id 79 priority 150 advert_int 1 smtp_alert authentication { auth_type PASS auth_pass MostBlah } virtual_ipaddress { 10.22.14.3/24 } } At boot the affected interfaces have: 5: eth4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether ab:cd:ef:90:c0:e3 brd ff:ff:ff:ff:ff:ff inet 10.22.14.6/24 brd 10.22.14.255 scope global eth4 valid_lft forever preferred_lft forever inet 10.22.14.3/24 scope global secondary eth4 valid_lft forever preferred_lft forever inet6 fe80::ae1f:6bff:fe90:c0e3/64 scope link valid_lft forever preferred_lft forever 7: eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether ab:cd:ef:b0:26:29 brd ff:ff:ff:ff:ff:ff inet 10.22.11.6/24 brd 10.22.11.255 scope global eth3 valid_lft forever preferred_lft forever inet 10.22.11.13/24 scope global secondary eth3 valid_lft forever preferred_lft forever inet6 fe80::ae1f:6bff:feb0:2629/64 scope link valid_lft forever preferred_lft forever 9: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether ab:cd:ef:b0:26:2b brd ff:ff:ff:ff:ff:ff inet 12.13.14.18/29 brd 12.13.14.23 scope global eth2 valid_lft forever preferred_lft forever inet 12.13.14.20/32 scope global eth2 valid_lft forever preferred_lft forever inet 12.33.89.19/29 brd 12.13.14.23 scope global secondary eth2 valid_lft forever preferred_lft forever inet6 fe80::ae1f:6bff:feb0:262b/64 scope link valid_lft forever preferred_lft forever Run 'netplan try' (didn't even make any changes to the configuration) and the keepalived addresses disappear never to return, the affected interfaces have: 5: eth4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether ab:cd:ef:90:c0:e3 brd ff:ff:ff:ff:ff:ff inet 10.22.14.6/24 brd 10.22.14.255 scope global eth4 valid_lft forever preferred_lft forever inet6 fe80::ae1f:6bff:fe90:c0e3/64 scope link valid_lft forever preferred_lft forever 7: eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether ab:cd:ef:b0:26:29 brd ff:ff:ff:ff:ff:ff inet 10.22.11.6/24 brd 10.22.11.255 scope global eth3 valid_lft forever preferred_lft forever inet6 fe80::ae1f:6bff:feb0:2629/64 scope link valid_lft forever preferred_lft forever 9: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether ab:cd:ef:b0:26:2b brd ff:ff:ff:ff:ff:ff inet 12.13.14.18/29 brd 12.13.14.23 scope global eth2 valid_lft forever preferred_lft forever inet 12.33.89.19/29 brd 12.13.14.23 scope global secondary eth2 valid_lft forever preferred_lft forever inet6 fe80::ae1f:6bff:feb0:262b/64 scope link valid_lft forever preferred_lft forever To manage notifications about this bug go to: https://bugs.launchpad.net/charm-keepalived/+bug/1815101/+subscriptions _______________________________________________ Mailing list: https://launchpad.net/~ubuntu-ha Post to : ubuntu-ha@lists.launchpad.net Unsubscribe : https://launchpad.net/~ubuntu-ha More help : https://help.launchpad.net/ListHelp