I found this open issue for 4.2.1. It seems to be effecting my 4.2 install as well.
https://issues.apache.org/jira/browse/CLOUDSTACK-5014 I am seeing this issue as well. It seems that the timeout occurs on any copy from secondary storage that exceeds 10 Minutes on vmware 5.1. This seems to affect only templates and images that are large enough not to be copied from secondary storage in that window. Small templates like the included CentOS and some home-brew micro ubuntu templates work fine. I was unable to locate any global setting that is set to 10 minutes, the closest thing I could find was secstorage.cmd.execution.time.max which I set to 240 minutes in my environment. I was able to mitigate some of this by using Fibrechannel as the target, it seemed like my template size was then limited what my 1GBE on secondary storage could transfer in 10 minutes. The deploy fails in CS but the Clone VM operation continues to run on the vmware side and does eventually complete leaving me with an orphaned ROOT-XXX vm on the vmware side. It seems that the solution would be to set that hard timeout to a value greater than 10 minutes but I can not locate it anywhere. The timestamps seem to be the smoking gun. Any suggestions would be appreciated. ----Begin Template Deploy---- 172.23.12.79 userdata,user-data none metadata,service-offering Micro Instance metadata,availability-zone US_EAST_LEB_Z01 metadata,local-ipv4 172.23.12.79 metadata,local-hostname MSWOAP200106 metadata,public-ipv4 209.212.253.138 metadata,public-hostname 209.212.253.138 metadata,instance-id bcb8fd42-5fe9-46be-b77f-13daebfd9c35 metadata,vm-id bcb8fd42-5fe9-46be-b77f-13daebfd9c35 metadata,public-keys none metadata,cloud-identifier CloudStack-{bc067003-923b-453e-ad37-5e06c2670966} 2013-11-21 15:41:14,957 INFO [vmware.resource.VmwareResource] (DirectAgent-8:esx01.lab.zimcom.net) vm_data command on domain router 172.23.5.119 completed 2013-11-21 15:41:14,958 DEBUG [agent.manager.DirectAgentAttache] (DirectAgent-8:null) Seq 15-1363607559: Response Received: 2013-11-21 15:41:14,958 DEBUG [agent.transport.Request] (DirectAgent-8:null) Seq 15-1363607559: Processing: { Ans: , MgmtId: 345049822399, via: 15, Ver: v1, Flags: 110, [{"com.cloud.agent.api.Answer":{"result":true,"wait":0}},{"com.cloud.agent.api.Answer":{"result":true,"wait":0}}] } 2013-11-21 15:41:14,958 DEBUG [agent.manager.AgentAttache] (DirectAgent-8:null) Seq 15-1363607559: No more commands found 2013-11-21 15:41:14,958 DEBUG [agent.transport.Request] (Job-Executor-1:job-652 = [ 9bad7d26-5de2-4d97-85de-1241cc28fc16 ]) Seq 15-1363607559: Received: { Ans: , MgmtId: 345049822399, via: 15, Ver: v1, Flags: 110, { Answer, Answer } } 2013-11-21 15:41:14,960 DEBUG [cloud.network.NetworkModelImpl] (Job-Executor-1:job-652 = [ 9bad7d26-5de2-4d97-85de-1241cc28fc16 ]) Service SecurityGroup is not supported in the network id=215 2013-11-21 15:41:14,962 DEBUG [cloud.storage.VolumeManagerImpl] (Job-Executor-1:job-652 = [ 9bad7d26-5de2-4d97-85de-1241cc28fc16 ]) Checking if we need to prepare 1 volumes for VM[User|MSWOAP200106] 2013-11-21 15:41:14,988 DEBUG [storage.image.TemplateDataFactoryImpl] (Job-Executor-1:job-652 = [ 9bad7d26-5de2-4d97-85de-1241cc28fc16 ]) template 255 is already in store:1, type:Image 2013-11-21 15:41:15,001 DEBUG [storage.image.TemplateDataFactoryImpl] (Job-Executor-1:job-652 = [ 9bad7d26-5de2-4d97-85de-1241cc28fc16 ]) template 255 is already in store:2, type:Primary 2013-11-21 15:41:15,038 DEBUG [storage.motion.AncientDataMotionStrategy] (Job-Executor-1:job-652 = [ 9bad7d26-5de2-4d97-85de-1241cc28fc16 ]) copyAsync inspecting src type TEMPLATE copyAsync inspecting dest type VOLUME 2013-11-21 15:41:15,056 DEBUG [agent.transport.Request] (Job-Executor-1:job-652 = [ 9bad7d26-5de2-4d97-85de-1241cc28fc16 ]) Seq 15-1363607560: Sending { Cmd , MgmtId: 345049822399, via: 15, Ver: v1, Flags: 100111, [{"org.apache.cloudstack.storage.command.CopyCommand":{"srcTO":{"org.apache.cloudstack.storage.to.TemplateObjectTO":{"path":"9c1b9052222e33d59385333fed5c973d" --- End Exception--- 2013-11-21 15:51:15,228 ERROR [storage.resource.VmwareStorageProcessor] (DirectAgent-11:esx01.lab.zimcom.net) clone volume from base image failed due to Exception: javax.xml.ws.WebServiceException Message: java.net.SocketTimeoutException: Read timed out javax.xml.ws.WebServiceException: java.net.SocketTimeoutException: Read timed out at com.sun.xml.internal.ws.transport.http.client.HttpClientTransport.readResponseCodeAndMessage(HttpClientTransport.java:201) at com.sun.xml.internal.ws.transport.http.client.HttpTransportPipe.process(HttpTransportPipe.java:151) at com.sun.xml.internal.ws.transport.http.client.HttpTransportPipe.processRequest(HttpTransportPipe.java:83) at com.sun.xml.internal.ws.transport.DeferredTransportPipe.processRequest(DeferredTransportPipe.java:78) at com.sun.xml.internal.ws.api.pipe.Fiber.__doRun(Fiber.java:587) at com.sun.xml.internal.ws.api.pipe.Fiber._doRun(Fiber.java:546) at com.sun.xml.internal.ws.api.pipe.Fiber.doRun(Fiber.java:531) at com.sun.xml.internal.ws.api.pipe.Fiber.runSync(Fiber.java:428) at com.sun.xml.internal.ws.client.Stub.process(Stub.java:211) at com.sun.xml.internal.ws.client.sei.SEIStub.doProcess(SEIStub.java:124) Steve Searles
