Public bug reported: Description =========== if the _update_available_resource() of resource_tracker is called between _do_rebuild_instance_with_claim() and instance.save() when evacuating VM instances on destination host,
nova/compute/manager.py 2931 def rebuild_instance(self, context, instance, orig_image_ref, image_ref, 2932 +-- 84 lines: injected_files, new_pass, orig_sys_metadata,------------------------------------------------------------------- 3016 claim_ctxt = rebuild_claim( 3017 context, instance, scheduled_node, 3018 limits=limits, image_meta=image_meta, 3019 migration=migration) 3020 self._do_rebuild_instance_with_claim( 3021 +-- 47 lines: claim_ctxt, context, instance, orig_image_ref,----------------------------------------------------------------- 3068 instance.apply_migration_context() 3069 # NOTE (ndipanov): This save will now update the host and node 3070 # attributes making sure that next RT pass is consistent since 3071 # it will be based on the instance and not the migration DB 3072 # entry. 3073 instance.host = self.host 3074 instance.node = scheduled_node 3075 instance.save() 3076 instance.drop_migration_context() the instance is not handled as managed instance of the destination host because it is not updated on DB yet. 2020-09-19 07:27:36.321 8 WARNING nova.compute.resource_tracker [req- b35d5b9a-0786-4809-bd81-ad306cdda8d5 - - - - -] Instance 22f6ca0e-f964-4467-83a3-f2bf12bb05ae is not being actively managed by this compute host but has allocations referencing this compute host: {u'resources': {u'MEMORY_MB': 12288, u'VCPU': 2, u'DISK_GB': 10}}. Skipping heal of allocation because we do not know what to do. And so the SRIOV ports (PCI device) was free by clean_usage() eventhough the VM has the VF port already. 743 def _update_available_resource(self, context, resources): 744 +-- 45 lines: # initialize the compute node object, creating it-------------------------------------------------------------- 789 self.pci_tracker.clean_usage(instances, migrations, orphans) 790 dev_pools_obj = self.pci_tracker.stats.to_device_pools_obj() After that, evacuated this VM to another compute host again, we got the error like below. Steps to reproduce ================== 1. create a VM on com1 with SRIOV VF ports. 2. stop and disable nova-compute service on com1 3. wait 60 sec (nova-compute reporting interval) 4. evauate the VM to com2 5. wait the VM is active on com2 6. enable and start nova-compute on com1 7. wait 60 sec (nova-compute reporting interval) 8. stop and disable nova-compute service on com2 9. wait 60 sec (nova-compute reporting interval) 10. evauate the VM to com1 11. wait the VM is active on com1 12. enable and start nova-compute on com2 13. wait 60 sec (nova-compute reporting interval) 14. go to step 2. Expected result =============== Evacuation should be done without errors. Actual result ============= Evacuation failed with "Port update failed" Environment =========== openstack-nova-compute-18.0.1-1 with SRIOV ports are used. libvirt is used. Logs & Configs ============== 2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [req-38dd0be2-7223-4a59-8073-dd1b072125c5 c424fbb3d41f444bb7a025266fda36da 6255a6910b9b4d3ba34a93624fe7fb22 - default default] [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] Setting instance vm_state to ERROR: PortUpdateFailed: Port update failed for port 76dc33dc-5b3b-4c45-b2cb-fd59025a4dbd: Unable to correlate PCI slot 0000:05:12.2 2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] Traceback (most recent call last): 2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 7993, in _error_out_instance_on_exception 2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] yield 2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 3025, in rebuild_instance 2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] migration, request_spec) 2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 3087, in _do_rebuild_instance_with_claim 2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] self._do_rebuild_instance(*args, **kwargs) 2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 3190, in _do_rebuild_instance 2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] context, instance, self.host, migration) 2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] File "/usr/lib/python2.7/site-packages/nova/network/neutronv2/api.py", line 2953, in setup_instance_network_on_host 2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] migration) 2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] File "/usr/lib/python2.7/site-packages/nova/network/neutronv2/api.py", line 3058, in _update_port_binding_for_instance 2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] pci_slot) 2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] PortUpdateFailed: Port update failed for port 76dc33dc-5b3b-4c45-b2cb-fd59025a4dbd: Unable to correlate PCI slot 0000:05:12.2 2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1896463 Title: evacuation failed: Port update failed : Unable to correlate PCI slot Status in OpenStack Compute (nova): New Bug description: Description =========== if the _update_available_resource() of resource_tracker is called between _do_rebuild_instance_with_claim() and instance.save() when evacuating VM instances on destination host, nova/compute/manager.py 2931 def rebuild_instance(self, context, instance, orig_image_ref, image_ref, 2932 +-- 84 lines: injected_files, new_pass, orig_sys_metadata,------------------------------------------------------------------- 3016 claim_ctxt = rebuild_claim( 3017 context, instance, scheduled_node, 3018 limits=limits, image_meta=image_meta, 3019 migration=migration) 3020 self._do_rebuild_instance_with_claim( 3021 +-- 47 lines: claim_ctxt, context, instance, orig_image_ref,----------------------------------------------------------------- 3068 instance.apply_migration_context() 3069 # NOTE (ndipanov): This save will now update the host and node 3070 # attributes making sure that next RT pass is consistent since 3071 # it will be based on the instance and not the migration DB 3072 # entry. 3073 instance.host = self.host 3074 instance.node = scheduled_node 3075 instance.save() 3076 instance.drop_migration_context() the instance is not handled as managed instance of the destination host because it is not updated on DB yet. 2020-09-19 07:27:36.321 8 WARNING nova.compute.resource_tracker [req- b35d5b9a-0786-4809-bd81-ad306cdda8d5 - - - - -] Instance 22f6ca0e-f964-4467-83a3-f2bf12bb05ae is not being actively managed by this compute host but has allocations referencing this compute host: {u'resources': {u'MEMORY_MB': 12288, u'VCPU': 2, u'DISK_GB': 10}}. Skipping heal of allocation because we do not know what to do. And so the SRIOV ports (PCI device) was free by clean_usage() eventhough the VM has the VF port already. 743 def _update_available_resource(self, context, resources): 744 +-- 45 lines: # initialize the compute node object, creating it-------------------------------------------------------------- 789 self.pci_tracker.clean_usage(instances, migrations, orphans) 790 dev_pools_obj = self.pci_tracker.stats.to_device_pools_obj() After that, evacuated this VM to another compute host again, we got the error like below. Steps to reproduce ================== 1. create a VM on com1 with SRIOV VF ports. 2. stop and disable nova-compute service on com1 3. wait 60 sec (nova-compute reporting interval) 4. evauate the VM to com2 5. wait the VM is active on com2 6. enable and start nova-compute on com1 7. wait 60 sec (nova-compute reporting interval) 8. stop and disable nova-compute service on com2 9. wait 60 sec (nova-compute reporting interval) 10. evauate the VM to com1 11. wait the VM is active on com1 12. enable and start nova-compute on com2 13. wait 60 sec (nova-compute reporting interval) 14. go to step 2. Expected result =============== Evacuation should be done without errors. Actual result ============= Evacuation failed with "Port update failed" Environment =========== openstack-nova-compute-18.0.1-1 with SRIOV ports are used. libvirt is used. Logs & Configs ============== 2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [req-38dd0be2-7223-4a59-8073-dd1b072125c5 c424fbb3d41f444bb7a025266fda36da 6255a6910b9b4d3ba34a93624fe7fb22 - default default] [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] Setting instance vm_state to ERROR: PortUpdateFailed: Port update failed for port 76dc33dc-5b3b-4c45-b2cb-fd59025a4dbd: Unable to correlate PCI slot 0000:05:12.2 2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] Traceback (most recent call last): 2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 7993, in _error_out_instance_on_exception 2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] yield 2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 3025, in rebuild_instance 2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] migration, request_spec) 2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 3087, in _do_rebuild_instance_with_claim 2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] self._do_rebuild_instance(*args, **kwargs) 2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 3190, in _do_rebuild_instance 2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] context, instance, self.host, migration) 2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] File "/usr/lib/python2.7/site-packages/nova/network/neutronv2/api.py", line 2953, in setup_instance_network_on_host 2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] migration) 2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] File "/usr/lib/python2.7/site-packages/nova/network/neutronv2/api.py", line 3058, in _update_port_binding_for_instance 2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] pci_slot) 2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] PortUpdateFailed: Port update failed for port 76dc33dc-5b3b-4c45-b2cb-fd59025a4dbd: Unable to correlate PCI slot 0000:05:12.2 2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1896463/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp