Rid opened a new issue, #10281: URL: https://github.com/apache/cloudstack/issues/10281
### problem We have a new CloudStack 4.20.0.0 environment with redundant VPC routers, but they never transition to MASTER. Instead: 1. Each VR tries to bring up the public interface (eth1) and add a default route, e.g.: ``` ip route add default via x.x.x.x table Table_eth1 proto static ``` …but this fails with exit code 2 (“Nexthop has invalid gateway”). We believe it fails as the interface remains in the DOWN state. 2. The VR script then tears eth1 down, inserts a “throw x.x.x.0/27” route in Table_eth1, and marks the router as BACKUP or FAULT. 3. Keepalived never starts because the script believes routing is broken. Thus no VRRP negotiation occurs, and no router becomes MASTER. We can manually bring eth1 up (ip link set eth1 up) and add a default route to the main or custom table, and it works fine. However, CloudStack’s scripts immediately revert the interface to DOWN again and keep the router in BACKUP. **Key details:** VR logs show repeated attempts to configure the default route via x.x.x.x inside Table_eth1, followed by throw x.x.x.0/27. Even if we remove the throw route, the script tries to add a route while eth1 is still down, fails, and resets to BACKUP. Because of this cycle, we never see /etc/keepalived/keepalived.conf generated or keepalived started. ### versions Apache CloudStack: 4.20.0.0 System VM template: Debian GNU/Linux 12 Hypervisor: KVM Networking: Advanced networking with VLAN trunking, rp_filter disabled We modified the systemvm template to add a static route which our setup needs. We added `/etc/network/if-up.d/91-add-route`: ``` #!/bin/sh # # /etc/network/if-up.d/91-add-route # # This script is automatically invoked by ifup each time # an interface is brought up. The environment variable $IFACE # contains the interface name (e.g., eth0, ens3, etc.). [ "$IFACE" = "lo" ] && exit 0 # Gather *all* IPv4 addresses (CIDR format) on this interface IP_CIDR_LIST=$(ip -o -4 addr show dev "$IFACE" | awk '{print $4}') [ -z "$IP_CIDR_LIST" ] && exit 0 # no IPv4 addresses on $IFACE, so exit # Loop through each IPv4 address on this interface for IP_CIDR in $IP_CIDR_LIST do # Extract the actual IP address (without /mask) IP_ADDR=$(echo "$IP_CIDR" | cut -d '/' -f 1) # Check if IP is in x.x.x.x/27 if echo "$IP_ADDR" | grep -Eq '^-redacted-$'; then echo "Interface $IFACE has IP $IP_ADDR in x.x.x.x/27; adding route..." ip route add x.x.x.x/27 dev "$IFACE" scope link src "$IP_ADDR" 2>/dev/null || true # Once we've added the route for the first matching IP, we're done. exit 0 fi done exit 0 ``` We do not believe this is related to the issue. ### The steps to reproduce the bug 1. Install or upgrade to CloudStack 4.20.0.0 with advanced networking. 2. Create a VPC offering that uses redundant VR. 3. Deploy a VPC that picks up two VRs. 4. Observe in /var/log/cloud.log (and the VR’s cloud.log) that each router fails to add its default route via x.x.x.x, then tears down eth1 and remains BACKUP/FAULT indefinitely. ### What to do about it? Ideally, the VR script should: 1. Ensure eth1 is brought up before adding the default route in the policy routing table (Table_eth1). 2. Avoid placing a “throw” route for x.x.x.0/27 on the router that’s intended to be MASTER. 3. Generate and start keepalived once the router is designated MASTER (or “PRIMARY” per the cmdline), so it can finalize the interface config instead of reverting to BACKUP. If you need more logs or specifics, we can provide full VR logs and examples of the failing ip route commands. Let us know if you have any questions or potential workarounds—thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@cloudstack.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org