[I] CKS cluster remains in Alert state if the scaling fails due to capacity issue on the hypervisor host [cloudstack]

via GitHub Tue, 24 Feb 2026 04:37:55 -0800


kiranchavala opened a new issue, #12699:
URL: https://github.com/apache/cloudstack/issues/12699


   ### problem
   
   CKS cluster remains in Alert state if the scaling fails due to capacity 
issue on the hypervisor host
   
   ### versions
   
   ACS 4.22
   
   ### The steps to reproduce the bug
   
   Have cloudstack environment with 2 kvm host in a cluster 
   
   1. Launch a Cks cluster with size 2 ( worker nodes) 
   
   Worker nodes deployed on kvm host 2 
   
   2. CKS cluster in running state 
   
   3. Deploy other vm's in the cloudstack environment  so that capacity of kvm 
host have reached 
   
   4. Scale the CKS cluster to size 3 
   
   5. Scaling of the CKS cluster fails due to capacity issue 
   
   The new worker node will be in stopped state 
   
   6. CKS cluster will be in Alert state 
   
   ```
   
   2026-02-24 11:12:14,223 DEBUG [c.c.k.c.KubernetesClusterManagerImpl] 
(Kubernetes-Cluster-State-Scanner-1:[ctx-c196e036]) (logid:43979d1a) Found VM: 
VM instance 
{"id":16,"instanceName":"i-2-16-VM","state":"Stopped","type":"User","uuid":"47386d74-3c9f-49aa-b102-1c10537c8350"}
 in the Kubernetes cluster KubernetesCluster 
{"id":2,"name":"test","uuid":"e155ab23-68ca-4c3e-b8c5-7175a3f65fda"} in state: 
Stopped while expected to be in state: Running. So moving the cluster to Alert 
state for reconciliation
   2026-02-24 11:12:14,224 DEBUG [c.c.k.c.KubernetesClusterManagerImpl] 
(Kubernetes-Cluster-State-Scanner-1:[ctx-c196e036]) (logid:43979d1a) Found VM: 
VM instance 
{"id":9,"instanceName":"i-2-9-VM","state":"Running","type":"User","uuid":"ebf0a5a6-01b7-462a-bad6-1f61887f0f41"}
 in the Kubernetes cluster KubernetesCluster 
{"id":2,"name":"test","uuid":"e155ab23-68ca-4c3e-b8c5-7175a3f65fda"} in state: 
Running while expected to be in state: Stopped. So moving the cluster to Alert 
state for reconciliation
   ```
   
   7. Cannot remove the worker node which is stopped state 
   
   Exception thrown 
   
   <img width="1623" height="528" alt="Image" 
src="https://github.com/user-attachments/assets/32b8bad2-db9c-4686-ac57-3c26b9f9d378";
 />
   
   
   ### What to do about it?
   
   CKS cluster should go back to running state since the scaling failed  due to 
insufficent capacity issue 
   
   Currently, we are checking only for resource limit during scaling operation 
with this pr 
   
   https://github.com/apache/cloudstack/pull/12167
   
   We should also check host capacity before scaling 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] CKS cluster remains in Alert state if the scaling fails due to capacity issue on the hypervisor host [cloudstack]

Reply via email to