Public bug reported: Controller Node: OS: openSUSE13.2 python-nova-2015.1.1.dev62-1.1.noarch openstack-nova-conductor-2015.1.1.dev62-1.1.noarch openstack-nova-scheduler-2015.1.1.dev62-1.1.noarch openstack-nova-cert-2015.1.1.dev62-1.1.noarch python-novaclient-2.23.0-2.4.noarch openstack-nova-novncproxy-2015.1.1.dev62-1.1.noarch openstack-nova-api-2015.1.1.dev62-1.1.noarch openstack-nova-consoleauth-2015.1.1.dev62-1.1.noarch openstack-nova-2015.1.1.dev62-1.1.noarch
Compute Node: OS: openSUSE13.1 openstack-nova-compute-2014.2.4.dev56-1.1.noarch python-novaclient-2.20.0-2.3.noarch python-nova-2014.2.4.dev56-1.1.noarch openstack-nova-2014.2.4.dev56-1.1.noarch During the installation of OpenStack using a Kilo Controller node, Kilo Network node and a Juno compute node, I found that the compute node was not registering the hypervisor with the controller. The hypervisor-list output was empty but the service-list output showed the compute node. After tracking through the code I found the root of the issue: During nova-compute startup, I determined that the compute node will check to see if it has already registered with the controller by querying both the service and compute_nodes tables. I noticed that the _get_service call was returning an exception. Call flow on the compute node I was looking at: /usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py update_available_resource _update_available_resource _init_compute_node _get_service <---------------- NotFound exception caught here self.conductor_api.service_get_by_compute_host(context,self.host) conductor/api.py:service_get_all_by Looking on the controller to determine the source of the exception I found where the request is handled: /usr/lib/python2.7/site-packages/nova/conductor/manager.py -> service_get_all_by() In this function the topic coming in is 'compute' so it is assumed to be a request from a Juno compute node. The services table is queried and successful but apparently Juno compute nodes also expect a compute_node field in the response that I presume is not present in Kilo. It proceeds to add the field and queries the compute_nodes table to determine if the host already exists there. This is fine if the host is present in that table, but if it is not present, an exception is thrown that is not handled. This causes service_get_all_by to not return a result. This propagates all the way back to the compute node resulting in the hypervisor not being registered with the controller. I was able to resolve this by catching the exception in service_get_all_by creating the expected field and defaulting it to None. if topic == 'compute': result = self.db.service_get_by_compute_host(context, host) # NOTE(sbauza): Only Juno computes are still calling this # conductor method for getting service_get_by_compute_node, # but expect a compute_node field so we can safely add it. try: result['compute_node' ] = objects.ComputeNodeList.get_all_by_host( context, result['host']) # FIXME(comstud) Potentially remove this on bump to v3.0 result = [result] except Exception: result['compute_node'] = None result = [result] Not sure if this is the correct fix or not but this unblocked me. ** Affects: nova Importance: Undecided Status: New ** Tags: nova-compute nova-controller -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1477261 Title: Juno Compute node unable to register hypervisor with Kilo Controller Status in OpenStack Compute (nova): New Bug description: Controller Node: OS: openSUSE13.2 python-nova-2015.1.1.dev62-1.1.noarch openstack-nova-conductor-2015.1.1.dev62-1.1.noarch openstack-nova-scheduler-2015.1.1.dev62-1.1.noarch openstack-nova-cert-2015.1.1.dev62-1.1.noarch python-novaclient-2.23.0-2.4.noarch openstack-nova-novncproxy-2015.1.1.dev62-1.1.noarch openstack-nova-api-2015.1.1.dev62-1.1.noarch openstack-nova-consoleauth-2015.1.1.dev62-1.1.noarch openstack-nova-2015.1.1.dev62-1.1.noarch Compute Node: OS: openSUSE13.1 openstack-nova-compute-2014.2.4.dev56-1.1.noarch python-novaclient-2.20.0-2.3.noarch python-nova-2014.2.4.dev56-1.1.noarch openstack-nova-2014.2.4.dev56-1.1.noarch During the installation of OpenStack using a Kilo Controller node, Kilo Network node and a Juno compute node, I found that the compute node was not registering the hypervisor with the controller. The hypervisor-list output was empty but the service-list output showed the compute node. After tracking through the code I found the root of the issue: During nova-compute startup, I determined that the compute node will check to see if it has already registered with the controller by querying both the service and compute_nodes tables. I noticed that the _get_service call was returning an exception. Call flow on the compute node I was looking at: /usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py update_available_resource _update_available_resource _init_compute_node _get_service <---------------- NotFound exception caught here self.conductor_api.service_get_by_compute_host(context,self.host) conductor/api.py:service_get_all_by Looking on the controller to determine the source of the exception I found where the request is handled: /usr/lib/python2.7/site-packages/nova/conductor/manager.py -> service_get_all_by() In this function the topic coming in is 'compute' so it is assumed to be a request from a Juno compute node. The services table is queried and successful but apparently Juno compute nodes also expect a compute_node field in the response that I presume is not present in Kilo. It proceeds to add the field and queries the compute_nodes table to determine if the host already exists there. This is fine if the host is present in that table, but if it is not present, an exception is thrown that is not handled. This causes service_get_all_by to not return a result. This propagates all the way back to the compute node resulting in the hypervisor not being registered with the controller. I was able to resolve this by catching the exception in service_get_all_by creating the expected field and defaulting it to None. if topic == 'compute': result = self.db.service_get_by_compute_host(context, host) # NOTE(sbauza): Only Juno computes are still calling this # conductor method for getting service_get_by_compute_node, # but expect a compute_node field so we can safely add it. try: result['compute_node' ] = objects.ComputeNodeList.get_all_by_host( context, result['host']) # FIXME(comstud) Potentially remove this on bump to v3.0 result = [result] except Exception: result['compute_node'] = None result = [result] Not sure if this is the correct fix or not but this unblocked me. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1477261/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp