hodie-aurora opened a new issue, #11431: URL: https://github.com/apache/cloudstack/issues/11431
### problem Issue Type: Bug Report CloudStack Version: 4.20.x (Based on Virtual Router version 4.20.0) Hypervisor: KVM Kubernetes Template: setup-v1.33.1-calico-x86_64.iso Summary: When creating a high-availability (HA) Kubernetes cluster using the CloudStack Kubernetes Service (CKS) in a VPC, the cluster initialization process gets stuck and eventually times out. The root cause appears to be that only one of the automatically created port forwarding rules for the control plane nodes is functional, preventing the management server and nodes from communicating with each other. Steps to Reproduce: Create a new VPC with a single network tier (e.g., 10.1.0.0/24). Navigate to Compute -> Kubernetes and start creating a new HA cluster. Select the setup-v1.33.1-calico-x86_64.iso template. Configure it with 3 control plane nodes and 1 worker node. Select the VPC network created in step 1. Crucially, leave the "Load Balancer IP" field empty, allowing CloudStack to automatically acquire a public IP and create the necessary rules. Launch the cluster. Expected Results: The cluster VMs are created, all necessary port forwarding rules (for SSH on port 22 and the K8s API on port 6443) are created on the VPC's virtual router, and the cluster successfully initializes, reaching a "Running" state. All nodes should be accessible via their respective forwarded ports. Actual Results: The cluster VMs are created successfully. All port forwarding rules are listed correctly in the CloudStack UI. The cluster state remains "Starting" for a long time, with management server logs repeatedly showing: Waiting for Kubernetes cluster ... control node VMs to be accessible. After a timeout, the cluster enters an "Error" state. Network tests (telnet <public_ip> <forwarded_port>) reveal that only one of the SSH port forwarding rules works. The others fail with a No route to host error. The single working rule is not always for the same node across different creation attempts. Diagnostics Performed: Port Forwarding Test: basic # telnet 192.168.10.225 2224 --> Connected successfully # telnet 192.168.10.225 2222 --> telnet: Unable to connect to remote host: No route to host # telnet 192.168.10.225 2223 --> telnet: Unable to connect to remote host: No route to host # telnet 192.168.10.225 2225 --> telnet: Unable to connect to remote host: No route to host Virtual Router iptables Inspection: I logged into the VPC's virtual router (r-53-VM) and confirmed that all DNAT rules are present and correct. This proves the issue is not with the virtual router's configuration. apache # iptables-save | grep DNAT -A PREROUTING -d 192.168.10.225/32 -p tcp -m tcp --dport 6443 -j DNAT --to-destination 10.1.0.209:6443 -A PREROUTING -d 192.168.10.225/32 -p tcp -m tcp --dport 2222 -j DNAT --to-destination 10.1.0.209:22 -A PREROUTING -d 192.168.10.225/32 -p tcp -m tcp --dport 2223 -j DNAT --to-destination 10.1.0.80:22 -A PREROUTING -d 192.168.10.225/32 -p tcp -m tcp --dport 2224 -j DNAT --to-destination 10.1.0.254:22 -A PREROUTING -d 192.168.10.225/32 -p tcp -m tcp --dport 2225 -j DNAT --to-destination 10.1.0.72:22 ... (and corresponding OUTPUT chain rules) Hypothesis: Since the virtual router is correctly forwarding traffic, the No route to host error strongly suggests that the packets are being rejected by the destination K8s node VMs themselves. The most likely cause is a default-on firewall (like firewalld or ufw) within the setup-v1.33.1-calico-x86_64.iso template. This firewall blocks incoming SSH connections from the virtual router, preventing cluster setup. The fact that one node is sometimes accessible might be due to timing, where one node's firewall is temporarily disabled during its initial setup phase before the rest of the cluster setup fails. Request: Could the development team please investigate if the CKS templates have a firewall enabled by default? If so, this seems to break the HA cluster creation process in a VPC and should either be disabled or pre-configured to allow traffic from the VPC's private network range. ### versions CloudStack Version: ~4.20.1.0 (inferred from Virtual Router software version) Hypervisor: KVM Primary Storage: NFS (inferred from compute offering name 8cpu-16g-nfs) Network: CloudStack VPC with an Isolated Guest Network and Virtual Router. CKS Template: setup-v1.33.1-calico-x86_64.iso ### The steps to reproduce the bug Create a new VPC with a single network tier (e.g., 10.1.0.0/24). Navigate to Compute -> Kubernetes and start creating a new HA cluster. Select the setup-v1.33.1-calico-x86_64.iso template. Configure it with 3 control plane nodes and 1 worker node. Select the VPC network created in step 1. Leave the "Load Balancer IP" field empty to allow CloudStack to automatically acquire an IP. Launch the cluster and observe that it fails to initialize, with only one forwarded port being accessible. ### What to do about it? Proposed Long-Term Fix: The CKS template setup-v1.33.1-calico-x86_64.iso should be modified. The firewall inside the template should either be disabled by default or, preferably, be pre-configured with rules that allow all necessary traffic for cluster setup. At a minimum, it should allow inbound traffic from the VPC's private network CIDR (e.g., 10.1.0.0/24 in this case) on the required ports (like SSH port 22 and Kubernetes API port 6443) to ensure the cluster initialization can complete successfully without manual intervention. Immediate Workaround for Users: For users encountering this bug, a potential workaround is to use the noVNC console from the CloudStack UI to access each Kubernetes node VM. Once logged in, manually disable the firewall. For example: On systemd-based systems: systemctl stop firewalld && systemctl disable firewalld On Debian/Ubuntu systems: ufw disable After disabling the firewalls on all nodes, the cluster setup process should be able to proceed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@cloudstack.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org