[ https://issues.apache.org/jira/browse/CLOUDSTACK-4499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13752162#comment-13752162 ]
Koushik Das commented on CLOUDSTACK-4499: ----------------------------------------- addHost creates an entry in the database if it is able to discover the host using the credentials specified and returns that. There is a separate thread that creates the agent attache for the host and that may not complete by the time addHost returns. The status of the host goes to 'Up' once the agent creation process completes successfully. If there are any issues encountered then the MS disconnects the agent attache (so that proper cleanup happens for resources) and then reinitiates the agent creation after some interval. So if any operation relies on the status of host to be 'Up' should check for it before proceeding. Coming to the 2nd issue related to XS 6.1/6.2 the SMlogs needs to be looked at to see why there is a failure in local link network creation. > Xen6.1/Xen6.2 hosts initially transition to 'Alert' and then to 'Up' after > addHost > ---------------------------------------------------------------------------------- > > Key: CLOUDSTACK-4499 > URL: https://issues.apache.org/jira/browse/CLOUDSTACK-4499 > Project: CloudStack > Issue Type: Bug > Security Level: Public(Anyone can view this level - this is the > default.) > Components: API, Management Server, XenServer > Affects Versions: 4.2.0 > Reporter: Prasanna Santhanam > Priority: Critical > Fix For: 4.2.1 > > Attachments: cloud.sql.tar.bz2, mslog.tar.bz2, SMlog > > > The same bug was reported in CLOUDSTACK-3839 which has been determined to be a > UI issue but the problem is with the addHost API and Xenserver hosts 6.1/6.2. > When a freshly provisioned 6.1/6.2 host is added to cloudstack it initially > goes into 'Alert' state thereby failing the storage pool creation > subsequently. > StoragePool addition requires the hosts to be in Up state. But a minute or two > later the Xenserver host automatically moves back to Up state after which > storage pool addition can occur > When the host is added to cloudstack: > 2013-08-26 06:46:22,965 DEBUG [agent.manager.DirectAgentAttache] > (DirectAgent-1:null) Seq 1-1916403719: Executing request > 2013-08-26 06:46:23,261 DEBUG [xen.resource.CitrixResourceBase] > (DirectAgent-1:null) Can't find a vif on dom0 for link local, creating a new > one > 2013-08-26 06:46:23,274 DEBUG [xen.resource.CitrixResourceBase] > (DirectAgent-1:null) Lowest available Vif device number: 0 for VM: Control > domain on host: apache-81-3 > 2013-08-26 06:46:23,600 WARN [xen.resource.CitrixResourceBase] > (DirectAgent-1:null) Unable to create local link network > The server failed to handle your request, due to an internal error. The > given message may give details useful for debugging the problem. > at com.xensource.xenapi.Types.checkResponse(Types.java:1694) > at com.xensource.xenapi.Connection.dispatch(Connection.java:368) > at > com.cloud.hypervisor.xen.resource.XenServerConnectionPool$XenServerConnection.dispatch(XenServerConnectionPool.java:909) > at com.xensource.xenapi.VIF.plug(VIF.java:846) > Host marked Alert: > 2013-08-26 06:46:23,616 DEBUG [agent.manager.AgentManagerImpl] > (catalina-exec-14:null) Sending Disconnect to listener: > com.cloud.network.NetworkUsageManagerImpl$DirectNetworkStatsListener > 2013-08-26 06:46:23,616 DEBUG [cloud.network.NetworkUsageManagerImpl] > (catalina-exec-14:null) Disconnected called on 1 with status Alert > 2013-08-26 06:46:23,616 DEBUG [agent.manager.AgentManagerImpl] > (catalina-exec-14:null) Sending Disconnect to listener: > com.cloud.consoleproxy.ConsoleProxyListener > 2013-08-26 06:46:23,619 DEBUG [cloud.host.Status] (catalina-exec-14:null) > Transition:[Resource state = Enabled, Agent event = AgentDisconnected, Host > id = 1, name = apache-81-3] > 2013-08-26 06:46:23,625 DEBUG [cloud.host.Status] (catalina-exec-14:null) > Agent status update: [id = 1; name = apache-81-3; old status = Connecting; > event = AgentDisconnected; new status = Alert; old update count = 1; new > update count = 2] > Automatic retry by CloudStack a few moments later: > 2013-08-26 06:46:52,752 DEBUG [host.dao.HostDaoImpl] (ClusteredAgentManager > Timer:null) Completed acquiring hosts for clusters not owned by any > management server > 2013-08-26 06:46:52,756 DEBUG [agent.manager.ClusteredAgentManagerImpl] > (ClusteredAgentManager Timer:null) Found 2 unmanaged direct hosts, processing > connect for them... > 2013-08-26 06:46:52,757 DEBUG [agent.manager.ClusteredAgentManagerImpl] > (ClusteredAgentManager Timer:null) Loading directly connected host > 1(apache-81-3) > Meanwhile Storage Pool addition fails: > 2013-08-26 06:46:52,766 DEBUG [cloud.api.ApiServlet] (catalina-exec-17:null) > ===END=== 10.208.8.5 -- GET > username=root&apiKey=sLUjl2n1c33JJpY4ZzMtObUgp9Ah2xW5inofCclZQQ1V6xp5k3CmAsGhemLwQxVvQY67EiopM_fbWBoJRKCIww&podid=fdf30680-d5c7-425e-bd7f-fb8c52e7cb96&hypervisor=XenServer&clusterid=10ff9a88-54bc-405b-9b4b-c26acf9caca8&zoneid=0431023e-ff78-43b6-a32d-262f36a78cbb&command=addHost&url=http%3A%2F%2Fapache-81-2&signature=oJqOno9flCtPMSSHH%2FZD%2BJl5QXw%3D&response=json > 2013-08-26 06:46:52,773 DEBUG [cloud.api.ApiServlet] (catalina-exec-16:null) > ===START=== 10.208.8.5 -- GET > apiKey=sLUjl2n1c33JJpY4ZzMtObUgp9Ah2xW5inofCclZQQ1V6xp5k3CmAsGhemLwQxVvQY67EiopM_fbWBoJRKCIww&name=z0p0c0ps0&url=nfs%3A%2F%2Fnfs.fmt.vmops.com%3A%2Fexport%2Fautomation%2Facs%2Fprimary&podid=fdf30680-d5c7-425e-bd7f-fb8c52e7cb96&clusterid=10ff9a88-54bc-405b-9b4b-c26acf9caca8&zoneid=0431023e-ff78-43b6-a32d-262f36a78cbb&command=createStoragePool&signature=70CuFbmf58PU4X0QbF1tKbBGT60%3D&response=json > 2013-08-26 06:46:52,787 DEBUG [cloud.network.NetworkModelImpl] > (ClusteredAgentManager Timer:null) Failed to retrive the default label for > storage traffic:zone: 1 hypervisor: XenServer due to:Unable to find the > default physical network with traffic=Storage in the specified zone id > 2013-08-26 06:46:52,839 DEBUG > [datastore.lifecycle.CloudStackPrimaryDataStoreLifeCycleImpl] > (catalina-exec-16:null) createPool Params @ scheme - nfs storageHost - > nfs.fmt.vmops.com hostPath - /export/automation/acs/primary port - -1 > 2013-08-26 06:46:52,889 DEBUG [agent.manager.ClusteredAgentManagerImpl] > (ClusteredAgentManager Timer:null) Loading directly connected host > 2(apache-81-2) > 2013-08-26 06:46:52,892 DEBUG [agent.manager.AgentManagerImpl] > (AgentTaskPool-2:null) Simulating start for resource > 192267e4-d97d-4e37-b899-82b157e3166f id 1 > 2013-08-26 06:46:52,900 DEBUG [cloud.storage.StorageManagerImpl] > (catalina-exec-16:null) Failed to add data store > com.cloud.utils.exception.CloudRuntimeException: No host up to associate a > storage pool with in cluster 1 > at > org.apache.cloudstack.storage.datastore.lifecycle.CloudStackPrimaryDataStoreLifeCycleImpl.attachCluster(CloudStackPrimaryDataStoreLifeCycleImpl.java:371) > at > com.cloud.storage.StorageManagerImpl.createPool(StorageManagerImpl.java:749) > at > com.cloud.storage.StorageManagerImpl.createPool(StorageManagerImpl.java:177) > Host comes back to Up state: > 2013-08-26 06:46:53,898 DEBUG [agent.manager.DirectAgentAttache] > (DirectAgent-4:null) Seq 1-1519321095: Executing request > 2013-08-26 06:46:54,279 DEBUG [xen.resource.CitrixResourceBase] > (DirectAgent-4:null) already have a vif on dom0 for link local network > 2013-08-26 06:46:54,703 DEBUG [agent.transport.Request] > (AgentTaskPool-2:null) Seq -1--1: Startup request from directly connected > host: { Cmd , MgmtId: -1, via: -1, Ver: v1, Flags: 11, > [{"cpus":4,"speed":2261,"memory":16190149248,"dom0MinMemory":588251136,"poolSync":false,"_clusterVMStates":{},"caps":"xen-3.0-x86_64 > , xen-3.0-x86_32p , hvm-3.0-x86_32 , hvm-3.0-x86_32p , > hvm-3.0-x86_64","pool":"9dfa936a-30f7-b5d1-4d4e-b5a75facb313","hypervisorType":"XenServer","hostDetails":{"product_version":"6.1.0","private.network.device":"Pool-wide > network associated with > eth0","com.cloud.network.Networks.RouterPrivateIpStrategy":"DcGlobal","product_brand":"XenServer","product_version_text_short":"6.1","can_bridge_firewall":"false"},"hypervisorVersion":"6.1.0","type":"Routing","dataCenter":"1","pod":"1","cluster":"1","guid":"192267e4-d97d-4e37-b899-82b157e3166f","name":"apache-81-3","version":"4.2.0","iqn":"iqn.2013-08.com.vmops.fmt:691cf4d5","publicIpAddress":"10.208.8.69","publicNetmask":"255.255.255.192","publicMacAddress":"b8:ac:6f:94:f1:67","privateIpAddress":"10.208.8.69","privateMacAddress":"b8:ac:6f:94:f1:67","privateNetmask":"255.255.255.192","storageIpAddress":"10.208.8.69","storageNetmask":"255.255.255.192","storageMacAddress":"b8:ac:6f:94:f1:67","wait":0},{"totalSize":0,"poolInfo":{"uuid":"ae248ed7-dd2a-cfc8-53f7-f43661c596c8","host":"10.208.8.69","localPath":"lvm","hostPath":"lvm","poolType":"LVM","capacityBytes":491505319936,"availableBytes":491501125632},"resourceType":"STORAGE_POOL","hostDetails":{},"type":"Storage","dataCenter":"1","guid":"192267e4-d97d-4e37-b899-82b157e3166f","wait":0}] > } > 2013-08-26 06:46:54,709 DEBUG [cloud.resource.ResourceManagerImpl] > (AgentTaskPool-2:null) Dispatching resource state event > CREATE_HOST_VO_FOR_DIRECT_CONNECT to > BaremetalPxeManagerImpl_EnhancerByCloudStack_d79b93ae > 2013-08-26 06:46:54,709 DEBUG [cloud.resource.ResourceManagerImpl] > (AgentTaskPool-2:null) Dispatching resource state event > CREATE_HOST_VO_FOR_DIRECT_CONNECT to > PremiumSecondaryStorageManagerImpl_EnhancerByCloudStack_b51c5fde > 2013-08-26 06:46:54,709 DEBUG [cloud.resource.ResourceManagerImpl] > (AgentTaskPool-2:null) Dispatching resource state event > CREATE_HOST_VO_FOR_DIRECT_CONNECT to > LxcServerDiscoverer_EnhancerByCloudStack_449ab3e6 > 2013-08-26 06:46:54,709 DEBUG [cloud.resource.ResourceManagerImpl] > (AgentTaskPool-2:null) Dispatching resource state event > CREATE_HOST_VO_FOR_DIRECT_CONNECT to NiciraNvp > 2013-08-26 06:46:54,709 DEBUG [cloud.resource.ResourceManagerImpl] > (AgentTaskPool-2:null) Dispatching resource state event > CREATE_HOST_VO_FOR_DIRECT_CONNECT to > OvmDiscoverer_EnhancerByCloudStack_ec7ad054 > 2013-08-26 06:46:54,710 DEBUG [cloud.resource.ResourceManagerImpl] > (AgentTaskPool-2:null) Dispatching resource state event > CREATE_HOST_VO_FOR_DIRECT_CONNECT to > XcpServerDiscoverer_EnhancerByCloudStack_831158ad > 2013-08-26 06:46:54,710 INFO [xen.discoverer.XcpServerDiscoverer] > (AgentTaskPool-2:null) Host: apache-81-3 connected with hypervisor type: > XenServer. Checking CIDR... > 2013-08-26 06:46:54,735 DEBUG [cloud.resource.ResourceState] > (AgentTaskPool-2:null) Resource state update: [id = 1; name = apache-81-3; > old state = Enabled; event = InternalCreated; new state = Enabled] > 2013-08-26 06:46:54,736 DEBUG [cloud.host.Status] (AgentTaskPool-2:null) > Transition:[Resource state = Enabled, Agent event = AgentConnected, Host id = > 1, name = apache-81-3] > 2013-08-26 06:46:54,742 DEBUG [cloud.host.Status] (AgentTaskPool-2:null) > Agent status update: [id = 1; name = apache-81-3; old status = Connecting; > event = AgentConnected; new status = Connecting; old update count = 3; new > update count = 4] > 2013-08-26 06:46:54,946 DEBUG [cloud.host.Status] (AgentTaskPool-1:null) > Agent status update: [id = 1; name = apache-81-3; old status = Connecting; > event = Ready; new status = Up; old update count = 4; new update count = 5] > There appear to be two problems: > 1. addHost API returns successfully even if the host has transitioned to > 'Alert' state > 2. Something wrong with Xenserver 6.2/6.1 hosts that makes this happen > This is on the latest off the 4.2 branch (8689f7f). > Attaching management server log and Database Dump. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira