Hi Indra,
Can you check whether the template on the secondary storage is actually valid? You can do this by running: qemu-img check <templatename>.qcow2 If you look at the logs (/var/log/cloudstack/agent/agent.log or agent.out) on the host that the copy is occurring on (the host is often randomly selected, so you might need to tail logs on all the hosts in the cluster and run the job again), do you see anything odd being reported back when the agent attempts to copy the template from secondary storage to primary storage? If you look in MySQL , take a look at the vm_template table and see what format is being reported for your new template. I've seen conversion issues before when qemu-img isn't provided the correct file formats. Since you're using RBD, the destination format should be raw. - Si ________________________________ From: Indra Pramana <[email protected]> Sent: Thursday, February 18, 2016 5:05 AM To: [email protected] Cc: Simon Weller Subject: Re: VM deployment from new template fails Hi Simon, Good day to you, and thank you for your reply. I apologise for the delay in my reply. I have run the health check on the SSVM and the result seems to be OK. === root@s-2985-VM:/var/log/cloud# /usr/local/cloud/systemvm/ssvm-check.sh ================================================ First DNS server is 8.8.8.8 PING 8.8.8.8 (8.8.8.8): 56 data bytes 64 bytes from 8.8.8.8<http://8.8.8.8>: icmp_seq=0 ttl=58 time=2.173 ms 64 bytes from 8.8.8.8<http://8.8.8.8>: icmp_seq=1 ttl=58 time=2.126 ms --- 8.8.8.8 ping statistics --- 2 packets transmitted, 2 packets received, 0% packet loss round-trip min/avg/max/stddev = 2.126/2.150/2.173/0.024 ms Good: Can ping DNS server ================================================ Good: DNS resolves download.cloud.com<http://download.cloud.com> ================================================ NFS is currently mounted Mount point is /mnt/SecStorage/cb10c4c8-84a1-3eee-b983-36b6fc8f7057 Good: Can write to mount point Mount point is /mnt/SecStorage/41cab17f-4dbc-37a5-b8d2-cdc153301319 Good: Can write to mount point ================================================ Management server is *.*.3.3. Checking connectivity. Good: Can connect to management server port 8250 ================================================ Good: Java process is running ================================================ Tests Complete. Look for ERROR or WARNING above. === No error and warning. I also checked /var/log/cloud/cloud.out log file and can't find anything. Do you think destroying and recreating the SSVM would help? >From what I can see, deploying VM from existing templates are OK, only >deploying VM from new templates (which was just created and not yet copied >over to the primary storage, I think) fails. So I think the problem is caused >during the copying from secondary storage to primary storage. But I don't know >what causes the copying to fail. This is another attempt I did today: ==== 2016-02-18 17:44:06,328 DEBUG [agent.transport.Request] (Job-Executor-36:job-41152 = [ c5e1cd55-ed2b-4935-9442-961c53de4c3f ]) Seq 34-1602142919: Received: { Ans: , MgmtId: 161342671900, via: 34, Ver: v1, Flags: 110, { Answer, Answer } } 2016-02-18 17:44:06,330 DEBUG [cloud.network.NetworkModelImpl] (Job-Executor-36:job-41152 = [ c5e1cd55-ed2b-4935-9442-961c53de4c3f ]) Service SecurityGroup is not supported in the network id=238 2016-02-18 17:44:06,333 DEBUG [cloud.storage.VolumeManagerImpl] (Job-Executor-36:job-41152 = [ c5e1cd55-ed2b-4935-9442-961c53de4c3f ]) Checking if we need to prepare 2 volumes for VM[User|test-centos67-20160218] 2016-02-18 17:44:06,348 DEBUG [storage.image.TemplateDataFactoryImpl] (Job-Executor-36:job-41152 = [ c5e1cd55-ed2b-4935-9442-961c53de4c3f ]) template 514 is already in store:46, type:Image 2016-02-18 17:44:06,354 DEBUG [storage.datastore.PrimaryDataStoreImpl] (Job-Executor-36:job-41152 = [ c5e1cd55-ed2b-4935-9442-961c53de4c3f ]) Not found (templateId:514poolId:214) in template_spool_ref, persisting it 2016-02-18 17:44:06,384 DEBUG [storage.image.TemplateDataFactoryImpl] (Job-Executor-36:job-41152 = [ c5e1cd55-ed2b-4935-9442-961c53de4c3f ]) template 514 is already in store:214, type:Primary 2016-02-18 17:44:06,387 DEBUG [storage.volume.VolumeServiceImpl] (Job-Executor-36:job-41152 = [ c5e1cd55-ed2b-4935-9442-961c53de4c3f ]) Found template ebe8d28c-03bc-49b6-8b21-442ecfff8270 in storage pool 214 with VMTemplateStoragePool id: 327 2016-02-18 17:44:06,398 DEBUG [storage.volume.VolumeServiceImpl] (Job-Executor-36:job-41152 = [ c5e1cd55-ed2b-4935-9442-961c53de4c3f ]) Acquire lock on VMTemplateStoragePool 327 with timeout 3600 seconds 2016-02-18 17:44:06,409 INFO [storage.volume.VolumeServiceImpl] (Job-Executor-36:job-41152 = [ c5e1cd55-ed2b-4935-9442-961c53de4c3f ]) lock is acquired for VMTemplateStoragePool 327 2016-02-18 17:44:06,449 DEBUG [storage.motion.AncientDataMotionStrategy] (Job-Executor-36:job-41152 = [ c5e1cd55-ed2b-4935-9442-961c53de4c3f ]) copyAsync inspecting src type TEMPLATE copyAsync inspecting dest type TEMPLATE 2016-02-18 17:44:06,466 DEBUG [agent.transport.Request] (Job-Executor-36:job-41152 = [ c5e1cd55-ed2b-4935-9442-961c53de4c3f ]) Seq 37-292022280: Sending { Cmd , MgmtId: 161342671900, via: 37, Ver: v1, Flags: 100111, [{"org.apache.cloudstack.storage.command.CopyCommand":{"srcTO":{"org.apache.cloudstack.storage.to.TemplateObjectTO":{"path":"template/tmpl/2/514/ebe8d28c-03bc-49b6-8b21-442ecfff8270.qcow2","uuid":"790ef1f6-8b80-4305-b561-5397d43295ad","id":514,"format":"QCOW2","accountId":2,"hvm":true,"displayText":"CentOS 6.7 (64-bit) 20160218","imageDataStore":{"com.cloud.agent.api.to.NfsTO":{"_url":"nfs://*.*.11.31/mnt/vol1/sec-storage2","_role":"Image"}},"name":"ebe8d28c-03bc-49b6-8b21-442ecfff8270","hypervisorType":"KVM"}},"destTO":{"org.apache.cloudstack.storage.to.TemplateObjectTO":{"uuid":"790ef1f6-8b80-4305-b561-5397d43295ad","id":514,"format":"QCOW2","accountId":2,"hvm":true,"displayText":"CentOS 6.7 (64-bit) 20160218","imageDataStore":{"org.apache.cloudstack.storage.to.PrimaryDataStoreTO":{"uuid":"d433809b-01ea-3947-ba0f-48077244e4d6","id":214,"poolType":"RBD","host":"*.*.com","path":"*-*-01","port":6789}},"name":"ebe8d28c-03bc-49b6-8b21-442ecfff8270","hypervisorType":"KVM"}},"executeInSequence":true,"wait":10800}}] } 2016-02-18 17:47:24,121 DEBUG [agent.transport.Request] (Job-Executor-36:job-41152 = [ c5e1cd55-ed2b-4935-9442-961c53de4c3f ]) Seq 37-292022280: Received: { Ans: , MgmtId: 161342671900, via: 37, Ver: v1, Flags: 110, { CopyCmdAnswer } } 2016-02-18 17:47:24,176 INFO [storage.volume.VolumeServiceImpl] (Job-Executor-36:job-41152 = [ c5e1cd55-ed2b-4935-9442-961c53de4c3f ]) releasing lock for VMTemplateStoragePool 327 2016-02-18 17:47:24,177 WARN [utils.db.Merovingian2] (Job-Executor-36:job-41152 = [ c5e1cd55-ed2b-4935-9442-961c53de4c3f ]) Was unable to find lock for the key template_spool_ref327 and thread id 1847536824 2016-02-18 17:47:24,177 DEBUG [cloud.storage.VolumeManagerImpl] (Job-Executor-36:job-41152 = [ c5e1cd55-ed2b-4935-9442-961c53de4c3f ]) Unable to create Vol[7687|vm=4070|ROOT]:com.cloud.utils.exception.CloudRuntimeException: Failed to copy /mnt/45f0a5a0-2ae4-319d-9dae-f11b1c2c585e/ebe8d28c-03bc-49b6-8b21-442ecfff8270.qcow2 to 062cc44b-2339-49a4-9142-b55f5e0dee7d 2016-02-18 17:47:24,177 INFO [cloud.vm.VirtualMachineManagerImpl] (Job-Executor-36:job-41152 = [ c5e1cd55-ed2b-4935-9442-961c53de4c3f ]) Unable to contact resource. 2016-02-18 17:47:24,199 DEBUG [cloud.vm.VirtualMachineManagerImpl] (Job-Executor-36:job-41152 = [ c5e1cd55-ed2b-4935-9442-961c53de4c3f ]) Cleaning up resources for the vm VM[User|test-centos67-20160218] in Starting state 2016-02-18 17:47:24,202 DEBUG [agent.transport.Request] (Job-Executor-36:job-41152 = [ c5e1cd55-ed2b-4935-9442-961c53de4c3f ]) Seq 68-1775429003: Sending { Cmd , MgmtId: 161342671900, via: 68, Ver: v1, Flags: 100111, [{"com.cloud.agent.api.StopCommand":{"isProxy":false,"executeInSequence":true,"vmName":"i-1976-4070-VM","wait":0}}] } 2016-02-18 17:47:24,860 DEBUG [agent.transport.Request] (Job-Executor-36:job-41152 = [ c5e1cd55-ed2b-4935-9442-961c53de4c3f ]) Seq 68-1775429003: Received: { Ans: , MgmtId: 161342671900, via: 68, Ver: v1, Flags: 110, { StopAnswer } } 2016-02-18 17:47:24,877 DEBUG [cloud.network.NetworkManagerImpl] (Job-Executor-36:job-41152 = [ c5e1cd55-ed2b-4935-9442-961c53de4c3f ]) Changing active number of nics for network id=238 on -1 2016-02-18 17:47:24,890 DEBUG [cloud.vm.VirtualMachineManagerImpl] (Job-Executor-36:job-41152 = [ c5e1cd55-ed2b-4935-9442-961c53de4c3f ]) Successfully released network resources for the vm VM[User|test-centos67-20160218] 2016-02-18 17:47:24,890 DEBUG [cloud.vm.VirtualMachineManagerImpl] (Job-Executor-36:job-41152 = [ c5e1cd55-ed2b-4935-9442-961c53de4c3f ]) Successfully cleanued up resources for the vm VM[User|test-centos67-20160218] in Starting state ===== The two significant error messages I noted were: - Failed to copy /mnt/45f0a5a0-2ae4-319d-9dae-f11b1c2c585e/ebe8d28c-03bc-49b6-8b21-442ecfff8270.qcow2 to 062cc44b-2339-49a4-9142-b55f5e0dee7d ; and - Was unable to find lock for the key template_spool_ref327 and thread id 1847536824 Not too sure if the copying failed because of locking mechanism issue? Anyone can provide any advise? Looking forward to your reply, thank you. Cheers. On Mon, Jan 25, 2016 at 9:29 PM, Simon Weller <[email protected]<mailto:[email protected]>> wrote: Have you done any debugging on the Secondary Storage VM yet? That's probably where I would focus. Here's a guide: https://cwiki.apache.org/confluence/display/CLOUDSTACK/SSVM,+templates,+Secondary+storage+troubleshooting - Si ________________________________________ From: Indra Pramana <[email protected]<mailto:[email protected]>> Sent: Sunday, January 24, 2016 9:02 PM To: [email protected]<mailto:[email protected]>; [email protected]<mailto:[email protected]> Subject: VM deployment from new template fails Dear all, We are using CloudStack 4.2.0 with KVM hypervisor and Ceph RBD primary storage. Since earlier this month, we encountered a strange problem where we cannot deploy new VM from any template which are newly created. Deploying VMs from existing template still works. This is a new problem we experienced, which only started since earlier this month -- we didn't have any issues creating new templates and deploying VMs from the new templates until late last month (December 2015). This is the error messages from the CloudStack management server. It seems that the process of copying the template (in .qcow format) from the secondary storage to the primary storage fails. === 2016-01-24 20:54:52,044 DEBUG [agent.transport.Request] (AgentManager-Handler-5:null) Seq 70-1791707721: Processing: { Ans: , MgmtId: 161342671900, via: 70, Ver: v1, Flags: 110, [{"org.apache.cloudstack.storage.command.CopyCmdAnswer":{" result":false,"details":"com.cloud.utils.exception.CloudRuntimeException: Failed to copy /mnt/a04d624b-e428-3448-aa22-e1e30f3a36e5/816449ef-dab0-4f4e-887f-7c51b2e508f7.qcow2 to 0aa1ac29-cec2-4a61-883c-27bd15f577c7","wait":0}}] } 2016-01-24 20:54:52,044 DEBUG [agent.manager.AgentAttache] (AgentManager-Handler-5:null) Seq 70-1791707730: Sending now. is current sequence. 2016-01-24 20:54:52,044 DEBUG [agent.transport.Request] (Job-Executor-29:job-39302 = [ 9d226ae4-02b7-4550-9ec5-cafca1fa736e ]) Seq 70-1791707721: Received: { Ans: , MgmtId: 161342671900, via: 70, Ver: v1, Flags: 110, { CopyCmdAnswer } } 2016-01-24 20:54:52,065 DEBUG [agent.transport.Request] (AgentManager-Handler-6:null) Seq 37-2102353958: Processing: { Ans: , MgmtId: 161342671900, via: 37, Ver: v1, Flags: 10, [{"com.cloud.agent.api.Answer":{"result":false,"details":"java.lang.NullPointerException\n\tat com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.execute(LibvirtComputingResource.java:2469)\n\tat com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.executeRequest(LibvirtComputingResource.java:1230)\n\tat com.cloud.agent.Agent.processRequest(Agent.java:525)\n\tat com.cloud.agent.Agent$AgentRequestHandler.doTask(Agent.java:852)\n\tat com.cloud.utils.nio.Task.run(Task.java:83)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)\n\tat java.lang.Thread.run(Thread.java:679)\n","wait":0}}] } 2016-01-24 20:54:52,066 DEBUG [agent.transport.Request] (StatsCollector-3:null) Seq 37-2102353958: Received: { Ans: , MgmtId: 161342671900, via: 37, Ver: v1, Flags: 10, { Answer } } 2016-01-24 20:54:52,123 INFO [storage.volume.VolumeServiceImpl] (Job-Executor-29:job-39302 = [ 9d226ae4-02b7-4550-9ec5-cafca1fa736e ]) releasing lock for VMTemplateStoragePool 323 2016-01-24 20:54:52,123 WARN [utils.db.Merovingian2] (Job-Executor-29:job-39302 = [ 9d226ae4-02b7-4550-9ec5-cafca1fa736e ]) Was unable to find lock for the key template_spool_ref323 and thread id 1999363220 2016-01-24 20:54:52,123 DEBUG [cloud.storage.VolumeManagerImpl] (Job-Executor-29:job-39302 = [ 9d226ae4-02b7-4550-9ec5-cafca1fa736e ]) Unable to create Vol[7508|vm=4019|ROOT]:com.cloud.utils.exception.CloudRuntimeException: Failed to copy /mnt/a04d624b-e428-3448-aa22-e1e30f3a36e5/816449ef-dab0-4f4e-887f-7c51b2e508f7.qcow2 to 0aa1ac29-cec2-4a61-883c-27bd15f577c7 2016-01-24 20:54:52,124 INFO [cloud.vm.VirtualMachineManagerImpl] (Job-Executor-29:job-39302 = [ 9d226ae4-02b7-4550-9ec5-cafca1fa736e ]) Unable to contact resource. com.cloud.exception.StorageUnavailableException: Resource [StoragePool:214] is unreachable: Unable to create Vol[7508|vm=4019|ROOT]:com.cloud.utils.exception.CloudRuntimeException: Failed to copy /mnt/a04d624b-e428-3448-aa22-e1e30f3a36e5/816449ef-dab0-4f4e-887f-7c51b2e508f7.qcow2 to 0aa1ac29-cec2-4a61-883c-27bd15f577c7 at com.cloud.storage.VolumeManagerImpl.recreateVolume(VolumeManagerImpl.java:2544) at com.cloud.storage.VolumeManagerImpl.prepare(VolumeManagerImpl.java:2592) at com.cloud.vm.VirtualMachineManagerImpl.advanceStart(VirtualMachineManagerImpl.java:889) at com.cloud.vm.VirtualMachineManagerImpl.start(VirtualMachineManagerImpl.java:578) at org.apache.cloudstack.engine.cloud.entity.api.VMEntityManagerImpl.deployVirtualMachine(VMEntityManagerImpl.java:227) at org.apache.cloudstack.engine.cloud.entity.api.VirtualMachineEntityImpl.deploy(VirtualMachineEntityImpl.java:209) at com.cloud.vm.UserVmManagerImpl.startVirtualMachine(UserVmManagerImpl.java:3406) at com.cloud.vm.UserVmManagerImpl.startVirtualMachine(UserVmManagerImpl.java:2966) at com.cloud.vm.UserVmManagerImpl.startVirtualMachine(UserVmManagerImpl.java:2952) at com.cloud.utils.component.ComponentInstantiationPostProcessor$InterceptorDispatcher.intercept(ComponentInstantiationPostProcessor.java:125) at org.apache.cloudstack.api.command.user.vm.DeployVMCmd.execute(DeployVMCmd.java:420) at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:158) at com.cloud.async.AsyncJobManagerImpl$1.run(AsyncJobManagerImpl.java:531) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:679) 2016-01-24 20:54:52,152 DEBUG [cloud.vm.VirtualMachineManagerImpl] (Job-Executor-29:job-39302 = [ 9d226ae4-02b7-4550-9ec5-cafca1fa736e ]) Cleaning up resources for the vm VM[User|test-ubuntu14-20160124a] in Starting state === On the hypervisor host, I can't find any useful clues from the agent.log file other than some messages stating that it tries to stop the VM (which doesn't exist yet since it's still in creating process). === 2016-01-24 20:54:52,161 DEBUG [kvm.resource.LibvirtConnection] (agentRequest-Handler-2:null) can't find connection: KVM, for vm: i-1939-4019-VM, continue 2016-01-24 20:54:52,162 DEBUG [kvm.resource.LibvirtConnection] (agentRequest-Handler-2:null) can't find connection: LXC, for vm: i-1939-4019-VM, continue 2016-01-24 20:54:52,163 DEBUG [kvm.resource.LibvirtConnection] (agentRequest-Handler-2:null) can't find which hypervisor the vm used , then use the default hypervisor 2016-01-24 20:54:52,163 DEBUG [kvm.resource.LibvirtComputingResource] (agentRequest-Handler-2:null) Failed to get dom xml: org.libvirt.LibvirtException: Domain not found: no domain with matching name 'i-1939-4019-VM' 2016-01-24 20:54:52,164 DEBUG [kvm.resource.LibvirtComputingResource] (agentRequest-Handler-2:null) Failed to get dom xml: org.libvirt.LibvirtException: Domain not found: no domain with matching name 'i-1939-4019-VM' 2016-01-24 20:54:52,165 DEBUG [kvm.resource.LibvirtComputingResource] (agentRequest-Handler-2:null) Failed to get dom xml: org.libvirt.LibvirtException: Domain not found: no domain with matching name 'i-1939-4019-VM' 2016-01-24 20:54:52,165 DEBUG [kvm.resource.LibvirtComputingResource] (agentRequest-Handler-2:null) Executing: /usr/share/cloudstack-common/scripts/vm/network/security_group.py destroy_network_rules_for_vm --vmname i-1939-4019-VM 2016-01-24 20:54:54,682 DEBUG [kvm.resource.LibvirtComputingResource] (agentRequest-Handler-2:null) Execution is successful. 2016-01-24 20:54:54,683 DEBUG [kvm.resource.LibvirtComputingResource] (agentRequest-Handler-2:null) Try to stop the vm at first 2016-01-24 20:54:54,685 DEBUG [kvm.resource.LibvirtComputingResource] (agentRequest-Handler-2:null) VM i-1939-4019-VM doesn't exist, no need to stop it === Connectivity from the particular hypervisor host (as well as our other hypervisor hosts) to both the primary and secondary storages are fine. I can mount folders to the secondary storage, and connection to the RBD primary storage is also OK since our current running VMs are having no issues. Any advice on how I can further troubleshoot and resolve this problem is greatly appreciated. We are not able to release new templates because of this problem. Thank you. Best regards, -ip-
