Reviewed: https://review.opendev.org/667687 Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=818419c9d313bd6151d1a05b3a087a15116f61b8 Submitter: Zuul Branch: master
commit 818419c9d313bd6151d1a05b3a087a15116f61b8 Author: Matt Riedemann <[email protected]> Date: Wed Jun 26 13:26:44 2019 -0400 Fix AttributeError in RT._update_usage_from_migration Change Ieb539c9a0cfbac743c579a1633234537a8e3e3ee in Stein added some logging in _update_usage_from_migration to log the flavor for an inbound and outbound migration. If an instance is resized and then the resize is immediately confirmed, it's possible to race with ComputeManager._confirm_resize setting the instance.old_flavor to None before the migration status is changed to "confirmed" while the update_available_resource periodic is running which will result in _update_usage_from_migration hitting an AttributeError when trying to log instance.old_flavor.flavorid since instance.old_flavor is None. There are a few key points there: - We get into _update_usage_from_migration because the _update_available_resource method gets in-progress migrations related to the host (in this case the source compute) and the migration is consider in-progress until its status is "confirmed". - The instance is not in the tracked_instances dict when _update_usage_from_migration runs because RT only tracks instances where the instance.host matches the RT.host and in this case the instance has been resized to another compute and the instance.host is pointing at the dest compute. The fix here is to simply check if we got the instance.old_flavor and not log the message if we do not have it, which gets us back to the old behavior. This bug was found by noticing it in CI job logs - there is a link to hits in logstash in the bug report. As for the "incoming and not tracked" case in _update_usage_from_migration I have not modified that since I am not sure we have the same race nor have I seen it in CI logs. Change-Id: I43e34b3ff1424d42632a3e8f842c93508905aa1a Closes-Bug: #1834349 ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1834349 Title: Error updating resources for node ubuntu-bionic-ovh-bhs1-0008373888.: AttributeError: 'NoneType' object has no attribute 'flavorid' - race with resize confirm Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) stein series: Confirmed Bug description: Seeing this in CI jobs: http://logs.openstack.org/45/666845/1/check/neutron-tempest-dvr-ha- multinode- full/5b09053/controller/logs/screen-n-cpu.txt#_Jun_26_13_51_48_973568 Jun 26 13:51:48.973568 ubuntu-bionic-ovh-bhs1-0008373888 nova-compute[27912]: ERROR nova.compute.manager [None req-903b3e73-3ce3-4f5c-9a30-811383077679 None None] Error updating resources for node ubuntu-bionic-ovh-bhs1-0008373888.: AttributeError: 'NoneType' object has no attribute 'flavorid' Jun 26 13:51:48.973568 ubuntu-bionic-ovh-bhs1-0008373888 nova-compute[27912]: ERROR nova.compute.manager Traceback (most recent call last): Jun 26 13:51:48.973568 ubuntu-bionic-ovh-bhs1-0008373888 nova-compute[27912]: ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/manager.py", line 8101, in _update_available_resource_for_node Jun 26 13:51:48.973568 ubuntu-bionic-ovh-bhs1-0008373888 nova-compute[27912]: ERROR nova.compute.manager startup=startup) Jun 26 13:51:48.973568 ubuntu-bionic-ovh-bhs1-0008373888 nova-compute[27912]: ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/resource_tracker.py", line 735, in update_available_resource Jun 26 13:51:48.973568 ubuntu-bionic-ovh-bhs1-0008373888 nova-compute[27912]: ERROR nova.compute.manager self._update_available_resource(context, resources, startup=startup) Jun 26 13:51:48.973568 ubuntu-bionic-ovh-bhs1-0008373888 nova-compute[27912]: ERROR nova.compute.manager File "/usr/local/lib/python3.6/dist-packages/oslo_concurrency/lockutils.py", line 328, in inner Jun 26 13:51:48.973568 ubuntu-bionic-ovh-bhs1-0008373888 nova-compute[27912]: ERROR nova.compute.manager return f(*args, **kwargs) Jun 26 13:51:48.973568 ubuntu-bionic-ovh-bhs1-0008373888 nova-compute[27912]: ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/resource_tracker.py", line 783, in _update_available_resource Jun 26 13:51:48.973568 ubuntu-bionic-ovh-bhs1-0008373888 nova-compute[27912]: ERROR nova.compute.manager self._update_usage_from_migrations(context, migrations, nodename) Jun 26 13:51:48.973568 ubuntu-bionic-ovh-bhs1-0008373888 nova-compute[27912]: ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/resource_tracker.py", line 1192, in _update_usage_from_migrations Jun 26 13:51:48.973568 ubuntu-bionic-ovh-bhs1-0008373888 nova-compute[27912]: ERROR nova.compute.manager nodename) Jun 26 13:51:48.973568 ubuntu-bionic-ovh-bhs1-0008373888 nova-compute[27912]: ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/resource_tracker.py", line 1118, in _update_usage_from_migration Jun 26 13:51:48.973568 ubuntu-bionic-ovh-bhs1-0008373888 nova-compute[27912]: ERROR nova.compute.manager migration.uuid, itype.flavorid, instance=instance) Jun 26 13:51:48.973568 ubuntu-bionic-ovh-bhs1-0008373888 nova-compute[27912]: ERROR nova.compute.manager AttributeError: 'NoneType' object has no attribute 'flavorid' Jun 26 13:51:48.973568 ubuntu-bionic-ovh-bhs1-0008373888 nova-compute[27912]: ERROR nova.compute.manager I'm not sure if this is a race, but it's this code: https://github.com/openstack/nova/blob/707deb158996d540111c23afd8c916ea1c18906a/nova/compute/resource_tracker.py#L1118 which was introduced with this change (of mine) in Stein: https://github.com/openstack/nova/commit/20a46ece5701c9798a5e0df12c944237cb1ece3e According to logstash this is only showing up in multinode jobs: http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22AttributeError%3A%20'NoneType'%20object%20has%20no%20attribute%20'flavorid'%5C%22%20AND%20tags%3A%5C%22screen-n-cpu.txt%5C%22&from=7d Given this code to get the flavor: https://github.com/openstack/nova/blob/707deb158996d540111c23afd8c916ea1c18906a/nova/compute/resource_tracker.py#L1439 def _get_instance_type(self, instance, prefix, migration): """Get the instance type from instance.""" stashed_flavors = migration.migration_type in ('resize',) if stashed_flavors: return getattr(instance, '%sflavor' % prefix) else: # NOTE(ndipanov): Certain migration types (all but resize) # do not change flavors so there is no need to stash # them. In that case - just get the instance flavor. return instance.flavor And this in the logs to load the instance.old_flavor: Jun 26 13:51:48.911052 ubuntu-bionic-ovh-bhs1-0008373888 nova-compute[27912]: INFO nova.compute.resource_tracker [None req-903b3e73-3ce3-4f5c-9a30-811383077679 None None] [instance: 4342d535-d00f-4ebc-930a-8e634f343eef] Updating resource usage from migration 87280d26-3511-47cb-9a69-6d510fd92718 Jun 26 13:51:48.911397 ubuntu-bionic-ovh-bhs1-0008373888 nova-compute[27912]: DEBUG nova.objects.instance [None req-903b3e73-3ce3-4f5c-9a30-811383077679 None None] Lazy-loading 'old_flavor' on Instance uuid 4342d535-d00f-4ebc-930a-8e634f343eef {{(pid=27912) obj_load_attr /opt/stack/nova/nova/objects/instance.py:1053}} This is coming from a resize operation. I also see this: Jun 26 13:51:48.844575 ubuntu-bionic-ovh-bhs1-0008373888 nova- compute[27912]: DEBUG nova.compute.resource_tracker [None req- 903b3e73-3ce3-4f5c-9a30-811383077679 None None] Migration for instance 4342d535-d00f-4ebc-930a-8e634f343eef refers to another host's instance! {{(pid=27912) _pair_instances_to_migrations /opt/stack/nova/nova/compute/resource_tracker.py:751}} Which was added because of https://review.opendev.org/#/c/382195/ so we might just be racing with the confirm/revert during the update_available_resource task and the resize was confirmed/reverted. Further up in the logs I also see this, confirming the instance is being resized: Jun 26 13:51:48.559335 ubuntu-bionic-ovh-bhs1-0008373888 nova- compute[27912]: INFO nova.virt.libvirt.driver [None req-903b3e73-3ce3 -4f5c-9a30-811383077679 None None] [instance: 4342d535-d00f-4ebc-930a- 8e634f343eef] Periodic task is updating the host stats; it is trying to get disk info for instance-0000006c, but the backing disk was removed by a concurrent operation (task_state=None) and (vm_state=resized) When confirming a resize (which tempest will do as soon as the server status goes to VERIFY_RESIZE in most tests), the instance.old_flavor is nulled out here: https://github.com/openstack/nova/blob/707deb158996d540111c23afd8c916ea1c18906a/nova/compute/manager.py#L4031 And then the move claim is dropped on the source compute: https://github.com/openstack/nova/blob/707deb158996d540111c23afd8c916ea1c18906a/nova/compute/manager.py#L4057 and that's where we remove it from tracked_migrations in the RT: https://github.com/openstack/nova/blob/707deb158996d540111c23afd8c916ea1c18906a/nova/compute/resource_tracker.py#L464 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1834349/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

