Re: VM deployment from new template fails

Simon Weller Thu, 18 Feb 2016 08:42:11 -0800

Hi Indra,


Can you check whether the template on the secondary storage is actually valid?

You can do this by running: qemu-img check <templatename>.qcow2


If you look at the logs (/var/log/cloudstack/agent/agent.log or agent.out) on 
the host that the copy is occurring on (the host is often randomly selected, so 
you might need to tail logs on all the hosts in the cluster and run the job 
again), do you see anything odd being reported back when the agent attempts to 
copy the template from secondary storage to primary storage?


If you look in MySQL , take a look at the vm_template table and see what format 
is being reported for your new template.


I've seen conversion issues before when qemu-img isn't provided the correct 
file formats. Since you're using RBD, the destination format should be raw.


- Si

________________________________
From: Indra Pramana <[email protected]>
Sent: Thursday, February 18, 2016 5:05 AM
To: [email protected]
Cc: Simon Weller
Subject: Re: VM deployment from new template fails

Hi Simon,

Good day to you, and thank you for your reply. I apologise for the delay in my 
reply.

I have run the health check on the SSVM and the result seems to be OK.

===
root@s-2985-VM:/var/log/cloud# /usr/local/cloud/systemvm/ssvm-check.sh
================================================
First DNS server is  8.8.8.8
PING 8.8.8.8 (8.8.8.8): 56 data bytes
64 bytes from 8.8.8.8<http://8.8.8.8>: icmp_seq=0 ttl=58 time=2.173 ms
64 bytes from 8.8.8.8<http://8.8.8.8>: icmp_seq=1 ttl=58 time=2.126 ms
--- 8.8.8.8 ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max/stddev = 2.126/2.150/2.173/0.024 ms
Good: Can ping DNS server
================================================
Good: DNS resolves download.cloud.com<http://download.cloud.com>
================================================
NFS is currently mounted
Mount point is /mnt/SecStorage/cb10c4c8-84a1-3eee-b983-36b6fc8f7057
Good: Can write to mount point
Mount point is /mnt/SecStorage/41cab17f-4dbc-37a5-b8d2-cdc153301319
Good: Can write to mount point
================================================
Management server is *.*.3.3. Checking connectivity.
Good: Can connect to management server port 8250
================================================
Good: Java process is running
================================================
Tests Complete. Look for ERROR or WARNING above.
===

No error and warning. I also checked /var/log/cloud/cloud.out log file and 
can't find anything. Do you think destroying and recreating the SSVM would help?

>From what I can see, deploying VM from existing templates are OK, only 
>deploying VM from new templates (which was just created and not yet copied 
>over to the primary storage, I think) fails. So I think the problem is caused 
>during the copying from secondary storage to primary storage. But I don't know 
>what causes the copying to fail. This is another attempt I did today:

====
2016-02-18 17:44:06,328 DEBUG [agent.transport.Request] 
(Job-Executor-36:job-41152 = [ c5e1cd55-ed2b-4935-9442-961c53de4c3f ]) Seq 
34-1602142919: Received:  { Ans: , MgmtId: 161342671900, via: 34, Ver: v1, 
Flags: 110, { Answer, Answer } }
2016-02-18 17:44:06,330 DEBUG [cloud.network.NetworkModelImpl] 
(Job-Executor-36:job-41152 = [ c5e1cd55-ed2b-4935-9442-961c53de4c3f ]) Service 
SecurityGroup is not supported in the network id=238
2016-02-18 17:44:06,333 DEBUG [cloud.storage.VolumeManagerImpl] 
(Job-Executor-36:job-41152 = [ c5e1cd55-ed2b-4935-9442-961c53de4c3f ]) Checking 
if we need to prepare 2 volumes for VM[User|test-centos67-20160218]
2016-02-18 17:44:06,348 DEBUG [storage.image.TemplateDataFactoryImpl] 
(Job-Executor-36:job-41152 = [ c5e1cd55-ed2b-4935-9442-961c53de4c3f ]) template 
514 is already in store:46, type:Image
2016-02-18 17:44:06,354 DEBUG [storage.datastore.PrimaryDataStoreImpl] 
(Job-Executor-36:job-41152 = [ c5e1cd55-ed2b-4935-9442-961c53de4c3f ]) Not 
found (templateId:514poolId:214) in template_spool_ref, persisting it
2016-02-18 17:44:06,384 DEBUG [storage.image.TemplateDataFactoryImpl] 
(Job-Executor-36:job-41152 = [ c5e1cd55-ed2b-4935-9442-961c53de4c3f ]) template 
514 is already in store:214, type:Primary
2016-02-18 17:44:06,387 DEBUG [storage.volume.VolumeServiceImpl] 
(Job-Executor-36:job-41152 = [ c5e1cd55-ed2b-4935-9442-961c53de4c3f ]) Found 
template ebe8d28c-03bc-49b6-8b21-442ecfff8270 in storage pool 214 with 
VMTemplateStoragePool id: 327
2016-02-18 17:44:06,398 DEBUG [storage.volume.VolumeServiceImpl] 
(Job-Executor-36:job-41152 = [ c5e1cd55-ed2b-4935-9442-961c53de4c3f ]) Acquire 
lock on VMTemplateStoragePool 327 with timeout 3600 seconds
2016-02-18 17:44:06,409 INFO  [storage.volume.VolumeServiceImpl] 
(Job-Executor-36:job-41152 = [ c5e1cd55-ed2b-4935-9442-961c53de4c3f ]) lock is 
acquired for VMTemplateStoragePool 327
2016-02-18 17:44:06,449 DEBUG [storage.motion.AncientDataMotionStrategy] 
(Job-Executor-36:job-41152 = [ c5e1cd55-ed2b-4935-9442-961c53de4c3f ]) 
copyAsync inspecting src type TEMPLATE copyAsync inspecting dest type TEMPLATE
2016-02-18 17:44:06,466 DEBUG [agent.transport.Request] 
(Job-Executor-36:job-41152 = [ c5e1cd55-ed2b-4935-9442-961c53de4c3f ]) Seq 
37-292022280: Sending  { Cmd , MgmtId: 161342671900, via: 37, Ver: v1, Flags: 
100111, 
[{"org.apache.cloudstack.storage.command.CopyCommand":{"srcTO":{"org.apache.cloudstack.storage.to.TemplateObjectTO":{"path":"template/tmpl/2/514/ebe8d28c-03bc-49b6-8b21-442ecfff8270.qcow2","uuid":"790ef1f6-8b80-4305-b561-5397d43295ad","id":514,"format":"QCOW2","accountId":2,"hvm":true,"displayText":"CentOS
 6.7 (64-bit) 
20160218","imageDataStore":{"com.cloud.agent.api.to.NfsTO":{"_url":"nfs://*.*.11.31/mnt/vol1/sec-storage2","_role":"Image"}},"name":"ebe8d28c-03bc-49b6-8b21-442ecfff8270","hypervisorType":"KVM"}},"destTO":{"org.apache.cloudstack.storage.to.TemplateObjectTO":{"uuid":"790ef1f6-8b80-4305-b561-5397d43295ad","id":514,"format":"QCOW2","accountId":2,"hvm":true,"displayText":"CentOS
 6.7 (64-bit) 
20160218","imageDataStore":{"org.apache.cloudstack.storage.to.PrimaryDataStoreTO":{"uuid":"d433809b-01ea-3947-ba0f-48077244e4d6","id":214,"poolType":"RBD","host":"*.*.com","path":"*-*-01","port":6789}},"name":"ebe8d28c-03bc-49b6-8b21-442ecfff8270","hypervisorType":"KVM"}},"executeInSequence":true,"wait":10800}}]
 }
2016-02-18 17:47:24,121 DEBUG [agent.transport.Request] 
(Job-Executor-36:job-41152 = [ c5e1cd55-ed2b-4935-9442-961c53de4c3f ]) Seq 
37-292022280: Received:  { Ans: , MgmtId: 161342671900, via: 37, Ver: v1, 
Flags: 110, { CopyCmdAnswer } }
2016-02-18 17:47:24,176 INFO  [storage.volume.VolumeServiceImpl] 
(Job-Executor-36:job-41152 = [ c5e1cd55-ed2b-4935-9442-961c53de4c3f ]) 
releasing lock for VMTemplateStoragePool 327
2016-02-18 17:47:24,177 WARN  [utils.db.Merovingian2] 
(Job-Executor-36:job-41152 = [ c5e1cd55-ed2b-4935-9442-961c53de4c3f ]) Was 
unable to find lock for the key template_spool_ref327 and thread id 1847536824
2016-02-18 17:47:24,177 DEBUG [cloud.storage.VolumeManagerImpl] 
(Job-Executor-36:job-41152 = [ c5e1cd55-ed2b-4935-9442-961c53de4c3f ]) Unable 
to create 
Vol[7687|vm=4070|ROOT]:com.cloud.utils.exception.CloudRuntimeException: Failed 
to copy 
/mnt/45f0a5a0-2ae4-319d-9dae-f11b1c2c585e/ebe8d28c-03bc-49b6-8b21-442ecfff8270.qcow2
 to 062cc44b-2339-49a4-9142-b55f5e0dee7d
2016-02-18 17:47:24,177 INFO  [cloud.vm.VirtualMachineManagerImpl] 
(Job-Executor-36:job-41152 = [ c5e1cd55-ed2b-4935-9442-961c53de4c3f ]) Unable 
to contact resource.
2016-02-18 17:47:24,199 DEBUG [cloud.vm.VirtualMachineManagerImpl] 
(Job-Executor-36:job-41152 = [ c5e1cd55-ed2b-4935-9442-961c53de4c3f ]) Cleaning 
up resources for the vm VM[User|test-centos67-20160218] in Starting state
2016-02-18 17:47:24,202 DEBUG [agent.transport.Request] 
(Job-Executor-36:job-41152 = [ c5e1cd55-ed2b-4935-9442-961c53de4c3f ]) Seq 
68-1775429003: Sending  { Cmd , MgmtId: 161342671900, via: 68, Ver: v1, Flags: 
100111, 
[{"com.cloud.agent.api.StopCommand":{"isProxy":false,"executeInSequence":true,"vmName":"i-1976-4070-VM","wait":0}}]
 }
2016-02-18 17:47:24,860 DEBUG [agent.transport.Request] 
(Job-Executor-36:job-41152 = [ c5e1cd55-ed2b-4935-9442-961c53de4c3f ]) Seq 
68-1775429003: Received:  { Ans: , MgmtId: 161342671900, via: 68, Ver: v1, 
Flags: 110, { StopAnswer } }
2016-02-18 17:47:24,877 DEBUG [cloud.network.NetworkManagerImpl] 
(Job-Executor-36:job-41152 = [ c5e1cd55-ed2b-4935-9442-961c53de4c3f ]) Changing 
active number of nics for network id=238 on -1
2016-02-18 17:47:24,890 DEBUG [cloud.vm.VirtualMachineManagerImpl] 
(Job-Executor-36:job-41152 = [ c5e1cd55-ed2b-4935-9442-961c53de4c3f ]) 
Successfully released network resources for the vm 
VM[User|test-centos67-20160218]
2016-02-18 17:47:24,890 DEBUG [cloud.vm.VirtualMachineManagerImpl] 
(Job-Executor-36:job-41152 = [ c5e1cd55-ed2b-4935-9442-961c53de4c3f ]) 
Successfully cleanued up resources for the vm VM[User|test-centos67-20160218] 
in Starting state
=====

The two significant error messages I noted were:
- Failed to copy 
/mnt/45f0a5a0-2ae4-319d-9dae-f11b1c2c585e/ebe8d28c-03bc-49b6-8b21-442ecfff8270.qcow2
 to 062cc44b-2339-49a4-9142-b55f5e0dee7d ; and
- Was unable to find lock for the key template_spool_ref327 and thread id 
1847536824

Not too sure if the copying failed because of locking mechanism issue? Anyone 
can provide any advise?

Looking forward to your reply, thank you.

Cheers.



On Mon, Jan 25, 2016 at 9:29 PM, Simon Weller 
<[email protected]<mailto:[email protected]>> wrote:
Have you done any debugging on the Secondary Storage VM yet?
That's probably where I would focus.

Here's a guide:

https://cwiki.apache.org/confluence/display/CLOUDSTACK/SSVM,+templates,+Secondary+storage+troubleshooting

- Si


________________________________________
From: Indra Pramana <[email protected]<mailto:[email protected]>>
Sent: Sunday, January 24, 2016 9:02 PM
To: [email protected]<mailto:[email protected]>; 
[email protected]<mailto:[email protected]>
Subject: VM deployment from new template fails

Dear all,

We are using CloudStack 4.2.0 with KVM hypervisor and Ceph RBD primary
storage.

Since earlier this month, we encountered a strange problem where we cannot
deploy new VM from any template which are newly created. Deploying VMs from
existing template still works.

This is a new problem we experienced, which only started since earlier this
month -- we didn't have any issues creating new templates and deploying VMs
from the new templates until late last month (December 2015).

This is the error messages from the CloudStack management server. It seems
that the process of copying the template (in .qcow format) from the
secondary storage to the primary storage fails.

===
2016-01-24 20:54:52,044 DEBUG [agent.transport.Request]
(AgentManager-Handler-5:null) Seq 70-1791707721: Processing:  { Ans: ,
MgmtId: 161342671900, via: 70, Ver: v1, Flags: 110,
[{"org.apache.cloudstack.storage.command.CopyCmdAnswer":{"
result":false,"details":"com.cloud.utils.exception.CloudRuntimeException:
Failed to copy
/mnt/a04d624b-e428-3448-aa22-e1e30f3a36e5/816449ef-dab0-4f4e-887f-7c51b2e508f7.qcow2
to 0aa1ac29-cec2-4a61-883c-27bd15f577c7","wait":0}}] }
2016-01-24 20:54:52,044 DEBUG [agent.manager.AgentAttache]
(AgentManager-Handler-5:null) Seq 70-1791707730: Sending now.  is current
sequence.
2016-01-24 20:54:52,044 DEBUG [agent.transport.Request]
(Job-Executor-29:job-39302 = [ 9d226ae4-02b7-4550-9ec5-cafca1fa736e ]) Seq
70-1791707721: Received:  { Ans: , MgmtId: 161342671900, via: 70, Ver: v1,
Flags: 110, { CopyCmdAnswer } }
2016-01-24 20:54:52,065 DEBUG [agent.transport.Request]
(AgentManager-Handler-6:null) Seq 37-2102353958: Processing:  { Ans: ,
MgmtId: 161342671900, via: 37, Ver: v1, Flags: 10,
[{"com.cloud.agent.api.Answer":{"result":false,"details":"java.lang.NullPointerException\n\tat
com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.execute(LibvirtComputingResource.java:2469)\n\tat
com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.executeRequest(LibvirtComputingResource.java:1230)\n\tat
com.cloud.agent.Agent.processRequest(Agent.java:525)\n\tat
com.cloud.agent.Agent$AgentRequestHandler.doTask(Agent.java:852)\n\tat
com.cloud.utils.nio.Task.run(Task.java:83)\n\tat
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)\n\tat
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)\n\tat
java.lang.Thread.run(Thread.java:679)\n","wait":0}}] }
2016-01-24 20:54:52,066 DEBUG [agent.transport.Request]
(StatsCollector-3:null) Seq 37-2102353958: Received:  { Ans: , MgmtId:
161342671900, via: 37, Ver: v1, Flags: 10, { Answer } }
2016-01-24 20:54:52,123 INFO  [storage.volume.VolumeServiceImpl]
(Job-Executor-29:job-39302 = [ 9d226ae4-02b7-4550-9ec5-cafca1fa736e ])
releasing lock for VMTemplateStoragePool 323
2016-01-24 20:54:52,123 WARN  [utils.db.Merovingian2]
(Job-Executor-29:job-39302 = [ 9d226ae4-02b7-4550-9ec5-cafca1fa736e ]) Was
unable to find lock for the key template_spool_ref323 and thread id
1999363220
2016-01-24 20:54:52,123 DEBUG [cloud.storage.VolumeManagerImpl]
(Job-Executor-29:job-39302 = [ 9d226ae4-02b7-4550-9ec5-cafca1fa736e ])
Unable to create
Vol[7508|vm=4019|ROOT]:com.cloud.utils.exception.CloudRuntimeException:
Failed to copy
/mnt/a04d624b-e428-3448-aa22-e1e30f3a36e5/816449ef-dab0-4f4e-887f-7c51b2e508f7.qcow2
to 0aa1ac29-cec2-4a61-883c-27bd15f577c7
2016-01-24 20:54:52,124 INFO  [cloud.vm.VirtualMachineManagerImpl]
(Job-Executor-29:job-39302 = [ 9d226ae4-02b7-4550-9ec5-cafca1fa736e ])
Unable to contact resource.
com.cloud.exception.StorageUnavailableException: Resource [StoragePool:214]
is unreachable: Unable to create
Vol[7508|vm=4019|ROOT]:com.cloud.utils.exception.CloudRuntimeException:
Failed to copy
/mnt/a04d624b-e428-3448-aa22-e1e30f3a36e5/816449ef-dab0-4f4e-887f-7c51b2e508f7.qcow2
to 0aa1ac29-cec2-4a61-883c-27bd15f577c7
        at
com.cloud.storage.VolumeManagerImpl.recreateVolume(VolumeManagerImpl.java:2544)
        at
com.cloud.storage.VolumeManagerImpl.prepare(VolumeManagerImpl.java:2592)
        at
com.cloud.vm.VirtualMachineManagerImpl.advanceStart(VirtualMachineManagerImpl.java:889)
        at
com.cloud.vm.VirtualMachineManagerImpl.start(VirtualMachineManagerImpl.java:578)
        at
org.apache.cloudstack.engine.cloud.entity.api.VMEntityManagerImpl.deployVirtualMachine(VMEntityManagerImpl.java:227)
        at
org.apache.cloudstack.engine.cloud.entity.api.VirtualMachineEntityImpl.deploy(VirtualMachineEntityImpl.java:209)
        at
com.cloud.vm.UserVmManagerImpl.startVirtualMachine(UserVmManagerImpl.java:3406)
        at
com.cloud.vm.UserVmManagerImpl.startVirtualMachine(UserVmManagerImpl.java:2966)
        at
com.cloud.vm.UserVmManagerImpl.startVirtualMachine(UserVmManagerImpl.java:2952)
        at
com.cloud.utils.component.ComponentInstantiationPostProcessor$InterceptorDispatcher.intercept(ComponentInstantiationPostProcessor.java:125)
        at
org.apache.cloudstack.api.command.user.vm.DeployVMCmd.execute(DeployVMCmd.java:420)
        at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:158)
        at
com.cloud.async.AsyncJobManagerImpl$1.run(AsyncJobManagerImpl.java:531)
        at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:679)
2016-01-24 20:54:52,152 DEBUG [cloud.vm.VirtualMachineManagerImpl]
(Job-Executor-29:job-39302 = [ 9d226ae4-02b7-4550-9ec5-cafca1fa736e ])
Cleaning up resources for the vm VM[User|test-ubuntu14-20160124a] in
Starting state
===

On the hypervisor host, I can't find any useful clues from the agent.log
file other than some messages stating that it tries to stop the VM (which
doesn't exist yet since it's still in creating process).

===
2016-01-24 20:54:52,161 DEBUG [kvm.resource.LibvirtConnection]
(agentRequest-Handler-2:null) can't find connection: KVM, for vm:
i-1939-4019-VM, continue
2016-01-24 20:54:52,162 DEBUG [kvm.resource.LibvirtConnection]
(agentRequest-Handler-2:null) can't find connection: LXC, for vm:
i-1939-4019-VM, continue
2016-01-24 20:54:52,163 DEBUG [kvm.resource.LibvirtConnection]
(agentRequest-Handler-2:null) can't find which hypervisor the vm used ,
then use the default hypervisor
2016-01-24 20:54:52,163 DEBUG [kvm.resource.LibvirtComputingResource]
(agentRequest-Handler-2:null) Failed to get dom xml:
org.libvirt.LibvirtException: Domain not found: no domain with matching
name 'i-1939-4019-VM'
2016-01-24 20:54:52,164 DEBUG [kvm.resource.LibvirtComputingResource]
(agentRequest-Handler-2:null) Failed to get dom xml:
org.libvirt.LibvirtException: Domain not found: no domain with matching
name 'i-1939-4019-VM'
2016-01-24 20:54:52,165 DEBUG [kvm.resource.LibvirtComputingResource]
(agentRequest-Handler-2:null) Failed to get dom xml:
org.libvirt.LibvirtException: Domain not found: no domain with matching
name 'i-1939-4019-VM'
2016-01-24 20:54:52,165 DEBUG [kvm.resource.LibvirtComputingResource]
(agentRequest-Handler-2:null) Executing:
/usr/share/cloudstack-common/scripts/vm/network/security_group.py
destroy_network_rules_for_vm --vmname i-1939-4019-VM
2016-01-24 20:54:54,682 DEBUG [kvm.resource.LibvirtComputingResource]
(agentRequest-Handler-2:null) Execution is successful.
2016-01-24 20:54:54,683 DEBUG [kvm.resource.LibvirtComputingResource]
(agentRequest-Handler-2:null) Try to stop the vm at first
2016-01-24 20:54:54,685 DEBUG [kvm.resource.LibvirtComputingResource]
(agentRequest-Handler-2:null) VM i-1939-4019-VM doesn't exist, no need to
stop it
===

Connectivity from the particular hypervisor host (as well as our other
hypervisor hosts) to both the primary and secondary storages are fine. I
can mount folders to the secondary storage, and connection to the RBD
primary storage is also OK since our current running VMs are having no
issues.

Any advice on how I can further troubleshoot and resolve this problem is
greatly appreciated. We are not able to release new templates because of
this problem.

Thank you.

Best regards,
-ip-

Re: VM deployment from new template fails

Reply via email to