Public bug reported: Description =============
I am running OpenStack Antelope via charmed with juju deployment on Ceph backed storage. Antelope was upgraded from Zed which was originally deployed following the official OpenStack charmed guide upgrade. All hosts are running the same hardware, they are Dell PowerEdge R610 with 24 cores and 48gb of RAM Tried --live-migration with volume backed VMs, with image backed VMs (and --block-migration). All hosts have /var/lib/nova/instances shared via NFS for local storage. VMs that should be live migrated do not have extra configuration properties linking to AZs or similar. Plain VMs created from Horizon dashboard. Steps to reproduce ================== Upgrade from Zed to Antelope, try to live-migrate VMs Logs & Configs ================= Environment uses libvirt/KVM with neutron-api and OVN SDN. Nova version 27.1.0 ii nova-api-os-compute 3:27.1.0-0ubuntu1.2~cloud0 all OpenStack Compute - OpenStack Compute API frontend ii nova-common 3:27.1.0-0ubuntu1.2~cloud0 all OpenStack Compute - common files ii nova-conductor 3:27.1.0-0ubuntu1.2~cloud0 all OpenStack Compute - conductor service ii nova-scheduler 3:27.1.0-0ubuntu1.2~cloud0 all OpenStack Compute - virtual machine scheduler ii nova-spiceproxy 3:27.1.0-0ubuntu1.2~cloud0 all OpenStack Compute - spice html5 proxy ii python3-nova 3:27.1.0-0ubuntu1.2~cloud0 all OpenStack Compute Python 3 libraries ii python3-novaclient 2:18.3.0-0ubuntu1~cloud0 all client library for OpenStack Compute API - 3.x Filters enabled: AvailabilityZoneFilter,ComputeFilter,ImagePropertiesFilter,DifferentHostFilter,SameHostFilter Charm configs are default with no changes, live migration worked in Zed, but now, after debugging the nova-cloud-controller: FULL LOG: https://pastebin.com/NvMazzkC In short, nova-scheduler iterates through the hosts and immediately this happens with each available host until the list is exhausted: 2024-08-07 10:15:36.663 1307737 DEBUG oslo_concurrency.lockutils [None req-2aa2922e-66b3-4543-81d5-ce8d92fb0eeb 91e3c47f7f6a42f1946f9b96d6e07be7 8ce43a2a472e424e8419635cd279b222 - - da112566f0a44d0c898dde46aee63dd7 da112566f0a44d0c898dde46aee63dd7] Lock "('os-host-10.maas', 'os-host-10.maas')" "released" by "nova.scheduler.host_manager.HostState.update.<locals>._locked_update" :: held 0.003s inner /usr/lib/python3/dist- packages/oslo_concurrency/lockutils.py:423 2024-08-07 10:15:36.663 1307737 DEBUG oslo_concurrency.lockutils [None req-2aa2922e-66b3-4543-81d5-ce8d92fb0eeb 91e3c47f7f6a42f1946f9b96d6e07be7 8ce43a2a472e424e8419635cd279b222 - - da112566f0a44d0c898dde46aee63dd7 da112566f0a44d0c898dde46aee63dd7] Acquiring lock "('os-host-11.maas', 'os-host-11.maas')" by "nova.scheduler.host_manager.HostState.update.<locals>._locked_update" inner /usr/lib/python3/dist-packages/oslo_concurrency/lockutils.py:404 2024-08-07 10:15:36.663 1307737 DEBUG oslo_concurrency.lockutils [None req-2aa2922e-66b3-4543-81d5-ce8d92fb0eeb 91e3c47f7f6a42f1946f9b96d6e07be7 8ce43a2a472e424e8419635cd279b222 - - da112566f0a44d0c898dde46aee63dd7 da112566f0a44d0c898dde46aee63dd7] Lock "('os-host-11.maas', 'os-host-11.maas')" acquired by "nova.scheduler.host_manager.HostState.update.<locals>._locked_update" :: waited 0.000s inner /usr/lib/python3/dist- packages/oslo_concurrency/lockutils.py:409 2024-08-07 10:15:36.664 1307737 DEBUG nova.scheduler.host_manager [None req-2aa2922e-66b3-4543-81d5-ce8d92fb0eeb 91e3c47f7f6a42f1946f9b96d6e07be7 8ce43a2a472e424e8419635cd279b222 - - da112566f0a44d0c898dde46aee63dd7 da112566f0a44d0c898dde46aee63dd7] Update host state from compute node ( all properties here pulled from that compute node) Update host state with aggregates: [Aggregate(created_at=2023-11-01T17:48:42Z,deleted=False,deleted_at=None,hosts=['os- host-4-shelf.maas','os-host-1.maas','os-host-2.maas','os- host-9.maas','os-host-11.maas','os-host-10.maas','os-host-6.maas','os- host-8.maas','os-host-7.maas','os-host-5.maas','os- host-3.maas'],id=1,metadata={availability_zone='nova'},name='nova_az',updated_at=None,uuid=9e0b10a6-8030-4bbf-92a7-724d4cb3a0d0)] _locked_update /usr/lib/python3/dist- packages/nova/scheduler/host_manager.py:172 Update host state with service dict: {'id': 52, 'uuid': 'c6778fc7-5575-4859-b6ad-cdca697cebac', 'host': 'os-host-11.maas', 'binary': 'nova-compute', 'topic': 'compute', 'report_count': 14216, 'disabled': False, 'disabled_reason': None, 'last_seen_up': datetime.datetime(2024, 8, 7, 10, 15, 36, tzinfo=datetime.timezone.utc), 'forced_down': False, 'version': 66, 'created_at': datetime.datetime(2024, 8, 5, 18, 44, 9, tzinfo=datetime.timezone.utc), 'updated_at': datetime.datetime(2024, 8, 7, 10, 15, 36, tzinfo=datetime.timezone.utc), 'deleted_at': None, 'deleted': False} _locked_update /usr/lib/python3/dist- packages/nova/scheduler/host_manager.py:175 2024-08-07 10:15:36.666 1307737 DEBUG nova.scheduler.host_manager [None req-2aa2922e-66b3-4543-81d5-ce8d92fb0eeb 91e3c47f7f6a42f1946f9b96d6e07be7 8ce43a2a472e424e8419635cd279b222 - - da112566f0a44d0c898dde46aee63dd7 da112566f0a44d0c898dde46aee63dd7] Update host state with instances: ['16a8944d-2ce0-4e3d-88d2-69c3752f3a63', '3d9ff4c9-4056-4bab-968e-22d4cb286113', '9a03c8e5-fd84-4802-a9bb-a9a93975775d', 'fffbea8e-3b01-4ede-8b47-f3d000975fd5'] _locked_update /usr/lib/python3/dist-packages/nova/scheduler/host_manager.py:178 2024-08-07 10:15:36.666 1307737 DEBUG oslo_concurrency.lockutils [None req-2aa2922e-66b3-4543-81d5-ce8d92fb0eeb 91e3c47f7f6a42f1946f9b96d6e07be7 8ce43a2a472e424e8419635cd279b222 - - da112566f0a44d0c898dde46aee63dd7 da112566f0a44d0c898dde46aee63dd7] Lock "('os-host-11.maas', 'os-host-11.maas')" "released" by "nova.scheduler.host_manager.HostState.update.<locals>._locked_update" :: held 0.003s inner /usr/lib/python3/dist- packages/oslo_concurrency/lockutils.py:423 2024-08-07 10:15:36.667 1307737 INFO nova.scheduler.host_manager [None req-2aa2922e-66b3-4543-81d5-ce8d92fb0eeb 91e3c47f7f6a42f1946f9b96d6e07be7 8ce43a2a472e424e8419635cd279b222 - - da112566f0a44d0c898dde46aee63dd7 da112566f0a44d0c898dde46aee63dd7] Host filter ignoring hosts: os-host-6.maas, os-host-3.maas, os-host-7.maas, os-host-9.maas, os-host-11.maas, os-host-5.maas, os-host-10.maas, os- host-8.maas 2024-08-07 10:15:36.667 1307737 DEBUG nova.scheduler.manager [None req-2aa2922e-66b3-4543-81d5-ce8d92fb0eeb 91e3c47f7f6a42f1946f9b96d6e07be7 8ce43a2a472e424e8419635cd279b222 - - da112566f0a44d0c898dde46aee63dd7 da112566f0a44d0c898dde46aee63dd7] Filtered [] _get_sorted_hosts /usr/lib/python3/dist- packages/nova/scheduler/manager.py:675 2024-08-07 10:15:36.667 1307737 DEBUG nova.scheduler.manager [None req-2aa2922e-66b3-4543-81d5-ce8d92fb0eeb 91e3c47f7f6a42f1946f9b96d6e07be7 8ce43a2a472e424e8419635cd279b222 - - da112566f0a44d0c898dde46aee63dd7 da112566f0a44d0c898dde46aee63dd7] There are 0 hosts available but 1 instances requested to build. _ensure_ ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/2076228 Title: nova-scheduler fails to acquire lock on hosts on live migration Status in OpenStack Compute (nova): New Bug description: Description ============= I am running OpenStack Antelope via charmed with juju deployment on Ceph backed storage. Antelope was upgraded from Zed which was originally deployed following the official OpenStack charmed guide upgrade. All hosts are running the same hardware, they are Dell PowerEdge R610 with 24 cores and 48gb of RAM Tried --live-migration with volume backed VMs, with image backed VMs (and --block-migration). All hosts have /var/lib/nova/instances shared via NFS for local storage. VMs that should be live migrated do not have extra configuration properties linking to AZs or similar. Plain VMs created from Horizon dashboard. Steps to reproduce ================== Upgrade from Zed to Antelope, try to live-migrate VMs Logs & Configs ================= Environment uses libvirt/KVM with neutron-api and OVN SDN. Nova version 27.1.0 ii nova-api-os-compute 3:27.1.0-0ubuntu1.2~cloud0 all OpenStack Compute - OpenStack Compute API frontend ii nova-common 3:27.1.0-0ubuntu1.2~cloud0 all OpenStack Compute - common files ii nova-conductor 3:27.1.0-0ubuntu1.2~cloud0 all OpenStack Compute - conductor service ii nova-scheduler 3:27.1.0-0ubuntu1.2~cloud0 all OpenStack Compute - virtual machine scheduler ii nova-spiceproxy 3:27.1.0-0ubuntu1.2~cloud0 all OpenStack Compute - spice html5 proxy ii python3-nova 3:27.1.0-0ubuntu1.2~cloud0 all OpenStack Compute Python 3 libraries ii python3-novaclient 2:18.3.0-0ubuntu1~cloud0 all client library for OpenStack Compute API - 3.x Filters enabled: AvailabilityZoneFilter,ComputeFilter,ImagePropertiesFilter,DifferentHostFilter,SameHostFilter Charm configs are default with no changes, live migration worked in Zed, but now, after debugging the nova-cloud-controller: FULL LOG: https://pastebin.com/NvMazzkC In short, nova-scheduler iterates through the hosts and immediately this happens with each available host until the list is exhausted: 2024-08-07 10:15:36.663 1307737 DEBUG oslo_concurrency.lockutils [None req-2aa2922e-66b3-4543-81d5-ce8d92fb0eeb 91e3c47f7f6a42f1946f9b96d6e07be7 8ce43a2a472e424e8419635cd279b222 - - da112566f0a44d0c898dde46aee63dd7 da112566f0a44d0c898dde46aee63dd7] Lock "('os-host-10.maas', 'os-host-10.maas')" "released" by "nova.scheduler.host_manager.HostState.update.<locals>._locked_update" :: held 0.003s inner /usr/lib/python3/dist- packages/oslo_concurrency/lockutils.py:423 2024-08-07 10:15:36.663 1307737 DEBUG oslo_concurrency.lockutils [None req-2aa2922e-66b3-4543-81d5-ce8d92fb0eeb 91e3c47f7f6a42f1946f9b96d6e07be7 8ce43a2a472e424e8419635cd279b222 - - da112566f0a44d0c898dde46aee63dd7 da112566f0a44d0c898dde46aee63dd7] Acquiring lock "('os-host-11.maas', 'os-host-11.maas')" by "nova.scheduler.host_manager.HostState.update.<locals>._locked_update" inner /usr/lib/python3/dist-packages/oslo_concurrency/lockutils.py:404 2024-08-07 10:15:36.663 1307737 DEBUG oslo_concurrency.lockutils [None req-2aa2922e-66b3-4543-81d5-ce8d92fb0eeb 91e3c47f7f6a42f1946f9b96d6e07be7 8ce43a2a472e424e8419635cd279b222 - - da112566f0a44d0c898dde46aee63dd7 da112566f0a44d0c898dde46aee63dd7] Lock "('os-host-11.maas', 'os-host-11.maas')" acquired by "nova.scheduler.host_manager.HostState.update.<locals>._locked_update" :: waited 0.000s inner /usr/lib/python3/dist- packages/oslo_concurrency/lockutils.py:409 2024-08-07 10:15:36.664 1307737 DEBUG nova.scheduler.host_manager [None req-2aa2922e-66b3-4543-81d5-ce8d92fb0eeb 91e3c47f7f6a42f1946f9b96d6e07be7 8ce43a2a472e424e8419635cd279b222 - - da112566f0a44d0c898dde46aee63dd7 da112566f0a44d0c898dde46aee63dd7] Update host state from compute node ( all properties here pulled from that compute node) Update host state with aggregates: [Aggregate(created_at=2023-11-01T17:48:42Z,deleted=False,deleted_at=None,hosts=['os- host-4-shelf.maas','os-host-1.maas','os-host-2.maas','os- host-9.maas','os-host-11.maas','os-host-10.maas','os-host-6.maas','os- host-8.maas','os-host-7.maas','os-host-5.maas','os- host-3.maas'],id=1,metadata={availability_zone='nova'},name='nova_az',updated_at=None,uuid=9e0b10a6-8030-4bbf-92a7-724d4cb3a0d0)] _locked_update /usr/lib/python3/dist- packages/nova/scheduler/host_manager.py:172 Update host state with service dict: {'id': 52, 'uuid': 'c6778fc7-5575-4859-b6ad-cdca697cebac', 'host': 'os-host-11.maas', 'binary': 'nova-compute', 'topic': 'compute', 'report_count': 14216, 'disabled': False, 'disabled_reason': None, 'last_seen_up': datetime.datetime(2024, 8, 7, 10, 15, 36, tzinfo=datetime.timezone.utc), 'forced_down': False, 'version': 66, 'created_at': datetime.datetime(2024, 8, 5, 18, 44, 9, tzinfo=datetime.timezone.utc), 'updated_at': datetime.datetime(2024, 8, 7, 10, 15, 36, tzinfo=datetime.timezone.utc), 'deleted_at': None, 'deleted': False} _locked_update /usr/lib/python3/dist- packages/nova/scheduler/host_manager.py:175 2024-08-07 10:15:36.666 1307737 DEBUG nova.scheduler.host_manager [None req-2aa2922e-66b3-4543-81d5-ce8d92fb0eeb 91e3c47f7f6a42f1946f9b96d6e07be7 8ce43a2a472e424e8419635cd279b222 - - da112566f0a44d0c898dde46aee63dd7 da112566f0a44d0c898dde46aee63dd7] Update host state with instances: ['16a8944d-2ce0-4e3d-88d2-69c3752f3a63', '3d9ff4c9-4056-4bab-968e-22d4cb286113', '9a03c8e5-fd84-4802-a9bb-a9a93975775d', 'fffbea8e-3b01-4ede-8b47-f3d000975fd5'] _locked_update /usr/lib/python3/dist-packages/nova/scheduler/host_manager.py:178 2024-08-07 10:15:36.666 1307737 DEBUG oslo_concurrency.lockutils [None req-2aa2922e-66b3-4543-81d5-ce8d92fb0eeb 91e3c47f7f6a42f1946f9b96d6e07be7 8ce43a2a472e424e8419635cd279b222 - - da112566f0a44d0c898dde46aee63dd7 da112566f0a44d0c898dde46aee63dd7] Lock "('os-host-11.maas', 'os-host-11.maas')" "released" by "nova.scheduler.host_manager.HostState.update.<locals>._locked_update" :: held 0.003s inner /usr/lib/python3/dist- packages/oslo_concurrency/lockutils.py:423 2024-08-07 10:15:36.667 1307737 INFO nova.scheduler.host_manager [None req-2aa2922e-66b3-4543-81d5-ce8d92fb0eeb 91e3c47f7f6a42f1946f9b96d6e07be7 8ce43a2a472e424e8419635cd279b222 - - da112566f0a44d0c898dde46aee63dd7 da112566f0a44d0c898dde46aee63dd7] Host filter ignoring hosts: os-host-6.maas, os-host-3.maas, os- host-7.maas, os-host-9.maas, os-host-11.maas, os-host-5.maas, os- host-10.maas, os-host-8.maas 2024-08-07 10:15:36.667 1307737 DEBUG nova.scheduler.manager [None req-2aa2922e-66b3-4543-81d5-ce8d92fb0eeb 91e3c47f7f6a42f1946f9b96d6e07be7 8ce43a2a472e424e8419635cd279b222 - - da112566f0a44d0c898dde46aee63dd7 da112566f0a44d0c898dde46aee63dd7] Filtered [] _get_sorted_hosts /usr/lib/python3/dist- packages/nova/scheduler/manager.py:675 2024-08-07 10:15:36.667 1307737 DEBUG nova.scheduler.manager [None req-2aa2922e-66b3-4543-81d5-ce8d92fb0eeb 91e3c47f7f6a42f1946f9b96d6e07be7 8ce43a2a472e424e8419635cd279b222 - - da112566f0a44d0c898dde46aee63dd7 da112566f0a44d0c898dde46aee63dd7] There are 0 hosts available but 1 instances requested to build. _ensure_ To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/2076228/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp