Dear all, We are using CloudStack 4.2.0 with KVM hypervisor and Ceph RBD primary storage.
Since earlier this month, we encountered a strange problem where we cannot deploy new VM from any template which are newly created. Deploying VMs from existing template still works. This is a new problem we experienced, which only started since earlier this month -- we didn't have any issues creating new templates and deploying VMs from the new templates until late last month (December 2015). This is the error messages from the CloudStack management server. It seems that the process of copying the template (in .qcow format) from the secondary storage to the primary storage fails. === 2016-01-24 20:54:52,044 DEBUG [agent.transport.Request] (AgentManager-Handler-5:null) Seq 70-1791707721: Processing: { Ans: , MgmtId: 161342671900, via: 70, Ver: v1, Flags: 110, [{"org.apache.cloudstack.storage.command.CopyCmdAnswer":{" result":false,"details":"com.cloud.utils.exception.CloudRuntimeException: Failed to copy /mnt/a04d624b-e428-3448-aa22-e1e30f3a36e5/816449ef-dab0-4f4e-887f-7c51b2e508f7.qcow2 to 0aa1ac29-cec2-4a61-883c-27bd15f577c7","wait":0}}] } 2016-01-24 20:54:52,044 DEBUG [agent.manager.AgentAttache] (AgentManager-Handler-5:null) Seq 70-1791707730: Sending now. is current sequence. 2016-01-24 20:54:52,044 DEBUG [agent.transport.Request] (Job-Executor-29:job-39302 = [ 9d226ae4-02b7-4550-9ec5-cafca1fa736e ]) Seq 70-1791707721: Received: { Ans: , MgmtId: 161342671900, via: 70, Ver: v1, Flags: 110, { CopyCmdAnswer } } 2016-01-24 20:54:52,065 DEBUG [agent.transport.Request] (AgentManager-Handler-6:null) Seq 37-2102353958: Processing: { Ans: , MgmtId: 161342671900, via: 37, Ver: v1, Flags: 10, [{"com.cloud.agent.api.Answer":{"result":false,"details":"java.lang.NullPointerException\n\tat com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.execute(LibvirtComputingResource.java:2469)\n\tat com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.executeRequest(LibvirtComputingResource.java:1230)\n\tat com.cloud.agent.Agent.processRequest(Agent.java:525)\n\tat com.cloud.agent.Agent$AgentRequestHandler.doTask(Agent.java:852)\n\tat com.cloud.utils.nio.Task.run(Task.java:83)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)\n\tat java.lang.Thread.run(Thread.java:679)\n","wait":0}}] } 2016-01-24 20:54:52,066 DEBUG [agent.transport.Request] (StatsCollector-3:null) Seq 37-2102353958: Received: { Ans: , MgmtId: 161342671900, via: 37, Ver: v1, Flags: 10, { Answer } } 2016-01-24 20:54:52,123 INFO [storage.volume.VolumeServiceImpl] (Job-Executor-29:job-39302 = [ 9d226ae4-02b7-4550-9ec5-cafca1fa736e ]) releasing lock for VMTemplateStoragePool 323 2016-01-24 20:54:52,123 WARN [utils.db.Merovingian2] (Job-Executor-29:job-39302 = [ 9d226ae4-02b7-4550-9ec5-cafca1fa736e ]) Was unable to find lock for the key template_spool_ref323 and thread id 1999363220 2016-01-24 20:54:52,123 DEBUG [cloud.storage.VolumeManagerImpl] (Job-Executor-29:job-39302 = [ 9d226ae4-02b7-4550-9ec5-cafca1fa736e ]) Unable to create Vol[7508|vm=4019|ROOT]:com.cloud.utils.exception.CloudRuntimeException: Failed to copy /mnt/a04d624b-e428-3448-aa22-e1e30f3a36e5/816449ef-dab0-4f4e-887f-7c51b2e508f7.qcow2 to 0aa1ac29-cec2-4a61-883c-27bd15f577c7 2016-01-24 20:54:52,124 INFO [cloud.vm.VirtualMachineManagerImpl] (Job-Executor-29:job-39302 = [ 9d226ae4-02b7-4550-9ec5-cafca1fa736e ]) Unable to contact resource. com.cloud.exception.StorageUnavailableException: Resource [StoragePool:214] is unreachable: Unable to create Vol[7508|vm=4019|ROOT]:com.cloud.utils.exception.CloudRuntimeException: Failed to copy /mnt/a04d624b-e428-3448-aa22-e1e30f3a36e5/816449ef-dab0-4f4e-887f-7c51b2e508f7.qcow2 to 0aa1ac29-cec2-4a61-883c-27bd15f577c7 at com.cloud.storage.VolumeManagerImpl.recreateVolume(VolumeManagerImpl.java:2544) at com.cloud.storage.VolumeManagerImpl.prepare(VolumeManagerImpl.java:2592) at com.cloud.vm.VirtualMachineManagerImpl.advanceStart(VirtualMachineManagerImpl.java:889) at com.cloud.vm.VirtualMachineManagerImpl.start(VirtualMachineManagerImpl.java:578) at org.apache.cloudstack.engine.cloud.entity.api.VMEntityManagerImpl.deployVirtualMachine(VMEntityManagerImpl.java:227) at org.apache.cloudstack.engine.cloud.entity.api.VirtualMachineEntityImpl.deploy(VirtualMachineEntityImpl.java:209) at com.cloud.vm.UserVmManagerImpl.startVirtualMachine(UserVmManagerImpl.java:3406) at com.cloud.vm.UserVmManagerImpl.startVirtualMachine(UserVmManagerImpl.java:2966) at com.cloud.vm.UserVmManagerImpl.startVirtualMachine(UserVmManagerImpl.java:2952) at com.cloud.utils.component.ComponentInstantiationPostProcessor$InterceptorDispatcher.intercept(ComponentInstantiationPostProcessor.java:125) at org.apache.cloudstack.api.command.user.vm.DeployVMCmd.execute(DeployVMCmd.java:420) at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:158) at com.cloud.async.AsyncJobManagerImpl$1.run(AsyncJobManagerImpl.java:531) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:679) 2016-01-24 20:54:52,152 DEBUG [cloud.vm.VirtualMachineManagerImpl] (Job-Executor-29:job-39302 = [ 9d226ae4-02b7-4550-9ec5-cafca1fa736e ]) Cleaning up resources for the vm VM[User|test-ubuntu14-20160124a] in Starting state === On the hypervisor host, I can't find any useful clues from the agent.log file other than some messages stating that it tries to stop the VM (which doesn't exist yet since it's still in creating process). === 2016-01-24 20:54:52,161 DEBUG [kvm.resource.LibvirtConnection] (agentRequest-Handler-2:null) can't find connection: KVM, for vm: i-1939-4019-VM, continue 2016-01-24 20:54:52,162 DEBUG [kvm.resource.LibvirtConnection] (agentRequest-Handler-2:null) can't find connection: LXC, for vm: i-1939-4019-VM, continue 2016-01-24 20:54:52,163 DEBUG [kvm.resource.LibvirtConnection] (agentRequest-Handler-2:null) can't find which hypervisor the vm used , then use the default hypervisor 2016-01-24 20:54:52,163 DEBUG [kvm.resource.LibvirtComputingResource] (agentRequest-Handler-2:null) Failed to get dom xml: org.libvirt.LibvirtException: Domain not found: no domain with matching name 'i-1939-4019-VM' 2016-01-24 20:54:52,164 DEBUG [kvm.resource.LibvirtComputingResource] (agentRequest-Handler-2:null) Failed to get dom xml: org.libvirt.LibvirtException: Domain not found: no domain with matching name 'i-1939-4019-VM' 2016-01-24 20:54:52,165 DEBUG [kvm.resource.LibvirtComputingResource] (agentRequest-Handler-2:null) Failed to get dom xml: org.libvirt.LibvirtException: Domain not found: no domain with matching name 'i-1939-4019-VM' 2016-01-24 20:54:52,165 DEBUG [kvm.resource.LibvirtComputingResource] (agentRequest-Handler-2:null) Executing: /usr/share/cloudstack-common/scripts/vm/network/security_group.py destroy_network_rules_for_vm --vmname i-1939-4019-VM 2016-01-24 20:54:54,682 DEBUG [kvm.resource.LibvirtComputingResource] (agentRequest-Handler-2:null) Execution is successful. 2016-01-24 20:54:54,683 DEBUG [kvm.resource.LibvirtComputingResource] (agentRequest-Handler-2:null) Try to stop the vm at first 2016-01-24 20:54:54,685 DEBUG [kvm.resource.LibvirtComputingResource] (agentRequest-Handler-2:null) VM i-1939-4019-VM doesn't exist, no need to stop it === Connectivity from the particular hypervisor host (as well as our other hypervisor hosts) to both the primary and secondary storages are fine. I can mount folders to the secondary storage, and connection to the RBD primary storage is also OK since our current running VMs are having no issues. Any advice on how I can further troubleshoot and resolve this problem is greatly appreciated. We are not able to release new templates because of this problem. Thank you. Best regards, -ip-