Hi,

I set up OpenStack according to Martin's tutorial at hastexo.com today on my 
development machine inside VirtualBox. As I forgot to change the libvirt_type 
to qemu and kvm isn't available inside VirtualBox, nova-compute understandably 
failed to boot the VM I created.

I changed the value in nova.conf (and nova-compute.conf as well) and restarted 
the nova services, expecting that now everything just boots up correctly, but 
nova-compute didn't recover at all. Instead, I got this exception in the logs:

2012-03-27 14:59:01 CRITICAL nova [-] Instance instance-00000001 could not be 
found.
(nova): TRACE: Traceback (most recent call last):
(nova): TRACE:   File "/usr/bin/nova-compute", line 49, in <module>
(nova): TRACE:     service.wait()
(nova): TRACE:   File "/usr/lib/python2.7/dist-packages/nova/service.py", line 
413, in wait
(nova): TRACE:     _launcher.wait()
(nova): TRACE:   File "/usr/lib/python2.7/dist-packages/nova/service.py", line 
131, in wait
(nova): TRACE:     service.wait()
(nova): TRACE:   File 
"/usr/lib/python2.7/dist-packages/eventlet/greenthread.py", line 166, in wait
(nova): TRACE:     return self._exit_event.wait()
(nova): TRACE:   File "/usr/lib/python2.7/dist-packages/eventlet/event.py", 
line 116, in wait
(nova): TRACE:     return hubs.get_hub().switch()
(nova): TRACE:   File "/usr/lib/python2.7/dist-packages/eventlet/hubs/hub.py", 
line 177, in switch
(nova): TRACE:     return self.greenlet.switch()
(nova): TRACE:   File 
"/usr/lib/python2.7/dist-packages/eventlet/greenthread.py", line 192, in main
(nova): TRACE:     result = function(*args, **kwargs)
(nova): TRACE:   File "/usr/lib/python2.7/dist-packages/nova/service.py", line 
101, in run_server
(nova): TRACE:     server.start()
(nova): TRACE:   File "/usr/lib/python2.7/dist-packages/nova/service.py", line 
162, in start
(nova): TRACE:     self.manager.init_host()
(nova): TRACE:   File 
"/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 247, in 
init_host
(nova): TRACE:     self.reboot_instance(context, instance['uuid'])
(nova): TRACE:   File "/usr/lib/python2.7/dist-packages/nova/exception.py", 
line 114, in wrapped
(nova): TRACE:     return f(*args, **kw)
(nova): TRACE:   File 
"/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 153, in 
decorated_function
(nova): TRACE:     function(self, context, instance_uuid, *args, **kwargs)
(nova): TRACE:   File 
"/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 171, in 
decorated_function
(nova): TRACE:     return function(self, context, instance_uuid, *args, 
**kwargs)
(nova): TRACE:   File 
"/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 898, in 
reboot_instance
(nova): TRACE:     reboot_type)
(nova): TRACE:   File "/usr/lib/python2.7/dist-packages/nova/exception.py", 
line 114, in wrapped
(nova): TRACE:     return f(*args, **kw)
(nova): TRACE:   File 
"/usr/lib/python2.7/dist-packages/nova/virt/libvirt/connection.py", line 753, 
in reboot
(nova): TRACE:     if self._soft_reboot(instance):
(nova): TRACE:   File 
"/usr/lib/python2.7/dist-packages/nova/virt/libvirt/connection.py", line 773, 
in _soft_reboot
(nova): TRACE:     dom = self._lookup_by_name(instance.name)
(nova): TRACE:   File 
"/usr/lib/python2.7/dist-packages/nova/virt/libvirt/connection.py", line 1567, 
in _lookup_by_name
(nova): TRACE:     raise exception.InstanceNotFound(instance_id=instance_name)
(nova): TRACE: InstanceNotFound: Instance instance-00000001 could not be found.
(nova): TRACE: 

I can only guess, that nova-compute failed somewhere where it isn't expected 
and left the data regarding this VM in an undefined state.

I found no way to recover from this failure. I tried to just "nova delete" the 
machine, checked a few minutes later using "nova show" and saw:

OS-DCF:diskConfig                   | MANUAL
OS-EXT-SRV-ATTR:host                | vagrant-precise64
OS-EXT-SRV-ATTR:hypervisor_hostname | None
OS-EXT-SRV-ATTR:instance_name       | instance-00000001
OS-EXT-STS:power_state              | 8
OS-EXT-STS:task_state               | deleting
OS-EXT-STS:vm_state                 | active
...

That looks right, but the deletion process never finishes. Nothing at all 
happens in the logs.
In "nova list", the instance is still listed as "Status: ACTIVE".

I tried to stop nova, delete the instance directory in /var/lib/nova/instances 
and restart nova, but that didn't help either (same exception).
I stopped nova again, deleted the VM from the instances (+ 
security_group_instance_association and instance_info_caches) table in nova's 
MySQL DB and restarted nova, but just got this different exception in the logs:

2012-03-27 14:55:11 ERROR nova.rpc.amqp 
[req-26b4686a-85f4-4566-bb0f-d87e8456b1f2 6b177562cbc1434fade182a45427134d 
3a21af5fa5fc470ebe2f2471ff5b49d3] Exception during message handling
(nova.rpc.amqp): TRACE: Traceback (most recent call last):
(nova.rpc.amqp): TRACE:   File 
"/usr/lib/python2.7/dist-packages/nova/rpc/amqp.py", line 252, in _process_data
(nova.rpc.amqp): TRACE:     rval = node_func(context=ctxt, **node_args)
(nova.rpc.amqp): TRACE:   File 
"/usr/lib/python2.7/dist-packages/nova/exception.py", line 114, in wrapped
(nova.rpc.amqp): TRACE:     return f(*args, **kw)
(nova.rpc.amqp): TRACE:   File 
"/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 142, in 
decorated_function
(nova.rpc.amqp): TRACE:     locked = self.get_lock(context, instance_uuid)
(nova.rpc.amqp): TRACE:   File 
"/usr/lib/python2.7/dist-packages/nova/exception.py", line 114, in wrapped
(nova.rpc.amqp): TRACE:     return f(*args, **kw)
(nova.rpc.amqp): TRACE:   File 
"/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 171, in 
decorated_function
(nova.rpc.amqp): TRACE:     return function(self, context, instance_uuid, 
*args, **kwargs)
(nova.rpc.amqp): TRACE:   File 
"/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1597, in 
get_lock
(nova.rpc.amqp): TRACE:     instance_ref = 
self.db.instance_get_by_uuid(context, instance_uuid)
(nova.rpc.amqp): TRACE:   File 
"/usr/lib/python2.7/dist-packages/nova/db/api.py", line 549, in 
instance_get_by_uuid
(nova.rpc.amqp): TRACE:     return IMPL.instance_get_by_uuid(context, uuid)
(nova.rpc.amqp): TRACE:   File 
"/usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/api.py", line 120, in 
wrapper
(nova.rpc.amqp): TRACE:     return f(*args, **kwargs)
(nova.rpc.amqp): TRACE:   File 
"/usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/api.py", line 1345, in 
instance_get_by_uuid
(nova.rpc.amqp): TRACE:     raise exception.InstanceNotFound(instance_id=uuid)
(nova.rpc.amqp): TRACE: InstanceNotFound: Instance 
73e90a02-7cef-4d64-a369-fbbc668ea91c could not be found.

Of course, I can just reset the whole DB and try again, as this is a 
development machine … but shouldn't nova-compute handle this (or any) kind of 
failure more gracefully?
Is there a way to cleanly recover from this situation?

Best regards,
Philipp


_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to     : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

Reply via email to