We just upgraded to CloudStack 4.2 (from 4.0.2) and now our Xen Cluster will not stay connected and the host are in alert states.
Here is a snip it out of the management-server.log 2013-10-10 09:22:13,995 DEBUG [cloud.capacity.CapacityManagerImpl] (AgentTaskPool-1:null) Found 6 VMs on host 4 2013-10-10 09:22:14,003 ERROR [agent.manager.AgentManagerImpl] (AgentTaskPool-1:null) Monitor ComputeCapacityListener says there is an error in the connect process for 4 due to null java.lang.NullPointerException at com.cloud.capacity.CapacityManagerImpl.updateCapacityForHost(CapacityManagerImpl.java:543) at com.cloud.utils.component.ComponentInstantiationPostProcessor$InterceptorDispatcher.intercept(ComponentInstantiationPostProcessor.java:125) at com.cloud.capacity.ComputeCapacityListener.processConnect(ComputeCapacityListener.java:78) at com.cloud.agent.manager.AgentManagerImpl.notifyMonitorsOfConnection(AgentManagerImpl.java:587) at com.cloud.agent.manager.AgentManagerImpl.handleDirectConnectAgent(AgentManagerImpl.java:1479) at com.cloud.resource.ResourceManagerImpl.createHostAndAgent(ResourceManagerImpl.java:1762) at com.cloud.resource.ResourceManagerImpl.createHostAndAgent(ResourceManagerImpl.java:1924) at com.cloud.agent.manager.AgentManagerImpl$SimulateStartTask.run(AgentManagerImpl.java:1130) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:679) 2013-10-10 09:22:14,004 INFO [agent.manager.AgentManagerImpl] (AgentTaskPool-1:null) Host 4 is disconnecting with event AgentDisconnected 2013-10-10 09:22:14,008 DEBUG [agent.manager.AgentManagerImpl] (AgentTaskPool-1:null) The next status of agent 4is Alert, current status is Connecting 2013-10-10 09:22:14,008 DEBUG [agent.manager.AgentManagerImpl] (AgentTaskPool-1:null) Deregistering link for 4 with state Alert 2013-10-10 09:22:14,008 DEBUG [agent.manager.AgentManagerImpl] (AgentTaskPool-1:null) Remove Agent : 4 2013-10-10 09:22:14,009 DEBUG [agent.manager.DirectAgentAttache] (AgentTaskPool-1:null) Processing disconnect 4 2013-10-10 09:22:14,009 DEBUG [agent.manager.AgentManagerImpl] (AgentTaskPool-1:null) Sending Disconnect to listener: com.cloud.hypervisor.xen.discoverer.XcpServerDiscoverer_EnhancerByCloudStack_434ade97 2013-10-10 09:22:14,009 DEBUG [agent.manager.AgentManagerImpl] (AgentTaskPool-1:null) Sending Disconnect to listener: com.cloud.deploy.DeploymentPlanningManagerImpl_EnhancerByCloudStack_a0f690d 2013-10-10 09:22:14,009 DEBUG [agent.manager.AgentManagerImpl] (AgentTaskPool-1:null) Sending Disconnect to listener: com.cloud.network.NetworkManagerImpl_EnhancerByCloudStack_1ba07aa0 2013-10-10 09:22:14,009 DEBUG [agent.manager.AgentManagerImpl] (AgentTaskPool-1:null) Sending Disconnect to listener: com.cloud.storage.secondary.SecondaryStorageListener 2013-10-10 09:22:14,009 DEBUG [agent.manager.AgentManagerImpl] (AgentTaskPool-1:null) Sending Disconnect to listener: com.cloud.hypervisor.vmware.manager.VmwareManagerImpl_EnhancerByCloudStack_b315799a 2013-10-10 09:22:14,009 DEBUG [agent.manager.AgentManagerImpl] (AgentTaskPool-1:null) Sending Disconnect to listener: com.cloud.network.security.SecurityGroupListener 2013-10-10 09:22:14,009 DEBUG [agent.manager.AgentManagerImpl] (AgentTaskPool-1:null) Sending Disconnect to listener: com.cloud.storage.listener.StoragePoolMonitor 2013-10-10 09:22:14,009 DEBUG [agent.manager.AgentManagerImpl] (AgentTaskPool-1:null) Sending Disconnect to listener: com.cloud.vm.ClusteredVirtualMachineManagerImpl_EnhancerByCloudStack_48612ba4 2013-10-10 09:22:14,009 DEBUG [agent.manager.AgentManagerImpl] (AgentTaskPool-1:null) Sending Disconnect to listener: com.cloud.storage.LocalStoragePoolListener 2013-10-10 09:22:14,009 DEBUG [agent.manager.AgentManagerImpl] (AgentTaskPool-1:null) Sending Disconnect to listener: com.cloud.network.SshKeysDistriMonitor 2013-10-10 09:22:14,009 DEBUG [agent.manager.AgentManagerImpl] (AgentTaskPool-1:null) Sending Disconnect to listener: com.cloud.network.router.VirtualNetworkApplianceManagerImpl_EnhancerByCloudStack_e1d29845 2013-10-10 09:22:14,009 DEBUG [agent.manager.AgentManagerImpl] (AgentTaskPool-1:null) Sending Disconnect to listener: com.cloud.network.SshKeysDistriMonitor 2013-10-10 09:22:14,009 DEBUG [agent.manager.AgentManagerImpl] (AgentTaskPool-1:null) Sending Disconnect to listener: com.cloud.network.router.VpcVirtualNetworkApplianceManagerImpl_EnhancerByCloudStack_5cb66068 2013-10-10 09:22:14,009 DEBUG [agent.manager.AgentManagerImpl] (AgentTaskPool-1:null) Sending Disconnect to listener: com.cloud.storage.upload.UploadListener 2013-10-10 09:22:14,009 DEBUG [agent.manager.AgentManagerImpl] (AgentTaskPool-1:null) Sending Disconnect to listener: com.cloud.storage.download.DownloadListener 2013-10-10 09:22:14,009 DEBUG [agent.manager.AgentManagerImpl] (AgentTaskPool-1:null) Sending Disconnect to listener: com.cloud.agent.manager.AgentMonitor 2013-10-10 09:22:14,009 DEBUG [agent.manager.AgentManagerImpl] (AgentTaskPool-1:null) Sending Disconnect to listener: com.cloud.capacity.StorageCapacityListener 2013-10-10 09:22:14,009 DEBUG [agent.manager.AgentManagerImpl] (AgentTaskPool-1:null) Sending Disconnect to listener: com.cloud.capacity.ComputeCapacityListener 2013-10-10 09:22:14,009 DEBUG [agent.manager.AgentManagerImpl] (AgentTaskPool-1:null) Sending Disconnect to listener: com.cloud.network.NetworkUsageManagerImpl$DirectNetworkStatsListener 2013-10-10 09:22:14,009 DEBUG [cloud.network.NetworkUsageManagerImpl] (AgentTaskPool-1:null) Disconnected called on 4 with status Alert 2013-10-10 09:22:14,009 DEBUG [agent.manager.AgentManagerImpl] (AgentTaskPool-1:null) Sending Disconnect to listener: com.cloud.consoleproxy.ConsoleProxyListener 2013-10-10 09:22:14,014 DEBUG [cloud.host.Status] (AgentTaskPool-1:null) Transition:[Resource state = Enabled, Agent event = AgentDisconnected, Host id = 4, name = c14-c1-3] 2013-10-10 09:22:14,026 DEBUG [cloud.host.Status] (AgentTaskPool-1:null) Agent status update: [id = 4; name = c14-c1-3; old status = Connecting; event = AgentDisconnected; new status = Alert; old update count = 2314; new update count = 2315] 2013-10-10 09:22:14,026 DEBUG [agent.manager.ClusteredAgentManagerImpl] (AgentTaskPool-1:null) Notifying other nodes of to disconnect 2013-10-10 09:22:14,029 WARN [cloud.resource.ResourceManagerImpl] (AgentTaskPool-1:null) Unable to connect due to com.cloud.utils.exception.CloudRuntimeException: Unable to connect 4 at com.cloud.agent.manager.AgentManagerImpl.notifyMonitorsOfConnection(AgentManagerImpl.java:606) at com.cloud.agent.manager.AgentManagerImpl.handleDirectConnectAgent(AgentManagerImpl.java:1479) at com.cloud.resource.ResourceManagerImpl.createHostAndAgent(ResourceManagerImpl.java:1762) at com.cloud.resource.ResourceManagerImpl.createHostAndAgent(ResourceManagerImpl.java:1924) at com.cloud.agent.manager.AgentManagerImpl$SimulateStartTask.run(AgentManagerImpl.java:1130) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:679) Caused by: java.lang.NullPointerException at com.cloud.capacity.CapacityManagerImpl.updateCapacityForHost(CapacityManagerImpl.java:543) at com.cloud.utils.component.ComponentInstantiationPostProcessor$InterceptorDispatcher.intercept(ComponentInstantiationPostProcessor.java:125) at com.cloud.capacity.ComputeCapacityListener.processConnect(ComputeCapacityListener.java:78) at com.cloud.agent.manager.AgentManagerImpl.notifyMonitorsOfConnection(AgentManagerImpl.java:587) ... 7 more 2013-10-10 09:22:14,030 DEBUG [cloud.host.Status] (AgentTaskPool-1:null) Transition:[Resource state = Enabled, Agent event = AgentDisconnected, Host id = 4, name = c14-c1-3] 2013-10-10 09:22:14,041 DEBUG [cloud.host.Status] (AgentTaskPool-1:null) Agent status update: [id = 4; name = c14-c1-3; old status = Alert; event = AgentDisconnected; new status = Alert; old update count = 2315; new update count = 2316] I have not been able to find any information online about this error or how to get the cluster to connect again. The Cluster is up to date on Hot Fixes and was working fine before the upgrade. The cluster is a 3 node cluster with fiber luns. Any help on this is greatly appreciated. -- Ryan James ColocateUSA http://www.colocateUSA.net [email protected]
