Public bug reported: We have been upgrading our sites from RDO to OSA. This process involved live migrating all VMs from a compute host before reinstalling it with OSA playbooks.
Note: the compute host is not "removed" from openstack in anyway; the new OSA node is the *same* hardware, same hostname etc - just reinstalled as OSA. This appears to have consequences for the way the placement API works - we have noticed that when live migrating the scheduler will often choose a highly loaded node where an empty node exists - for example - in the below output from my live migration script the VM is being migrated from cc-compute04-kna1; the scheduler has chosen cc-compute01-kna1 as the target this despite the load it currently has, and that compute09, 15 and 18 are all empty Migration Destination: cc-compute01-kna1 Migration ID: 12993 +-------------------+-----------------DEBUG------------+-----+-----------+---------+ +-------------------+----------------------------------+-----+-----------+---------+ | Host | Project | CPU | Memory MB | Disk GB | +-------------------+----------------------------------+-----+-----------+---------+ | cc-compute04-kna1 | (used_now) | 124 | 254976 | 2790 | | cc-compute01-kna1 | (used_now) | 230 | 466432 | 8210 | +-------------------+----------------------------------+-----+-----------+---------+ | cc-compute03-kna1 | (used_now) | 174 | 327680 | 4740 | | cc-compute05-kna1 | (used_now) | 198 | 457728 | 4430 | | cc-compute06-kna1 | (used_now) | 163 | 366592 | 4650 | | cc-compute07-kna1 | (used_now) | 170 | 415744 | 4460 | | cc-compute08-kna1 | (used_now) | 178 | 382464 | 4750 | | cc-compute09-kna1 | (used_now) | 0 | 2048 | 0 | | cc-compute11-kna1 | (used_now) | 131 | 313856 | 3100 | | cc-compute12-kna1 | (used_now) | 176 | 392704 | 4800 | | cc-compute13-kna1 | (used_now) | 173 | 390656 | 5470 | | cc-compute14-kna1 | (used_now) | 2 | 4096 | 50 | | cc-compute15-kna1 | (used_now) | 0 | 2048 | 0 | | cc-compute16-kna1 | (used_now) | 170 | 355840 | 5410 | | cc-compute17-kna1 | (used_now) | 281 | 646656 | 5370 | | cc-compute18-kna1 | (used_now) | 0 | 2048 | 0 | | cc-compute19-kna1 | (used_now) | 207 | 517120 | 4860 | | cc-compute20-kna1 | (used_now) | 223 | 560640 | 5150 | | cc-compute23-kna1 | (used_now) | 184 | 406528 | 6350 | | cc-compute24-kna1 | (used_now) | 190 | 585216 | 4820 | | cc-compute25-kna1 | (used_now) | 235 | 491520 | 5500 | | cc-compute26-kna1 | (used_now) | 283 | 610304 | 9390 | | cc-compute27-kna1 | (used_now) | 200 | 573440 | 6730 | | cc-compute28-kna1 | (used_now) | 269 | 587264 | 6600 | | cc-compute29-kna1 | (used_now) | 245 | 494080 | 8480 | +-------------------+----------------------------------+-----+-----------+---------+ this is not an isolated case, and is something we have seen frequently to the point where we override the scheduler and use targeted migrations to achieve better load balancing. Interrogating the Placement API for a compute (09) prior to reinstallation I can find the UUID { "generation": 480003, "links": [ { "href": "/resource_providers/d6aeeeb0-0cab-4e3f-a070-9808801b94a5", "rel": "self" }, { "href": "/resource_providers/d6aeeeb0-0cab-4e3f-a070-9808801b94a5/inventories", "rel": "inventories" }, { "href": "/resource_providers/d6aeeeb0-0cab-4e3f-a070-9808801b94a5/usages", "rel": "usages" }, { "href": "/resource_providers/d6aeeeb0-0cab-4e3f-a070-9808801b94a5/aggregates", "rel": "aggregates" } ], "name": "cc-compute09-kna1", "uuid": "d6aeeeb0-0cab-4e3f-a070-9808801b94a5" }, after the node is reinstalled it has a new UUID { "generation": 71, "links": [ { "href": "/resource_providers/d7f483ff-3b91-4d13-9900-0ec24c3a06a4", "rel": "self" }, { "href": "/resource_providers/d7f483ff-3b91-4d13-9900-0ec24c3a06a4/inventories", "rel": "inventories" }, { "href": "/resource_providers/d7f483ff-3b91-4d13-9900-0ec24c3a06a4/usages", "rel": "usages" }, { "href": "/resource_providers/d7f483ff-3b91-4d13-9900-0ec24c3a06a4/aggregates", "rel": "aggregates" } ], "name": "compute09.openstack.local", "uuid": "d7f483ff-3b91-4d13-9900-0ec24c3a06a4" }, this new resource provider shows 0 consumed resources curl -g -X GET http://********:8780/resource_providers/d7f483ff-3b91-4d13-9900-0ec24c3a06a4/usages -H "Accept: application/json" -H "OpenStack-API-Version: placement 1.2" -H "User-Agent: openstacksdk/0.31.0 keystoneauth1/3.14.0 python-requests/2.22.0 CPython/2.7.12" -H "X-Auth-Token:************" | python -m json.tool % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 89 100 89 0 0 2870 0 --:--:-- --:--:-- --:--:-- 2870 { "resource_provider_generation": 72, "usages": { "DISK_GB": 0, "MEMORY_MB": 0, "VCPU": 0 } } investigating the resource_providers table shows pottential duplicate entries - MariaDB [nova_api]> select * from resource_providers; +---------------------+---------------------+-----+--------------------------------------+------------------------------+------------+----------+------------------+--------------------+ | created_at | updated_at | id | uuid | name | generation | can_host | root_provider_id | parent_provider_id | +---------------------+---------------------+-----+--------------------------------------+------------------------------+------------+----------+------------------+--------------------+ | 2018-04-25 21:25:32 | 2019-04-17 07:08:24 | 1 | cbb2c235-ed5f-4f63-9015-1edfe91d63c8 | cc-compute02-kna1 | 195067 | 0 | 1 | NULL | | 2018-04-25 21:44:17 | 2019-05-02 13:23:34 | 2 | 6125fdeb-370f-4139-9d1c-369e9eb4e620 | cc-compute-lsd01-kna1 | 41 | 0 | 2 | NULL | | 2018-04-25 22:13:01 | 2019-05-20 13:11:55 | 3 | 452b7f99-a178-4dc7-9fea-e9d9ab6a3e99 | cc-compute05-kna1 | 450192 | 0 | 3 | NULL | | 2018-04-25 22:13:08 | 2019-06-10 12:28:41 | 4 | 03b420df-79fb-4f0a-aede-bdbd62ce9ce3 | cc-compute03-kna1 | 424867 | 0 | 4 | NULL | | 2018-04-25 22:13:08 | 2019-06-14 06:29:47 | 5 | 9386d418-339c-4010-baa5-18e2aa601a3c | cc-compute04-kna1 | 479160 | 0 | 5 | NULL | | 2018-04-25 22:46:46 | 2019-05-20 13:39:00 | 6 | 7b0580e3-7592-4c3a-a0e9-a8d23f3550d7 | cc-compute07-kna1 | 441489 | 0 | 6 | NULL | | 2018-04-25 22:46:47 | 2019-04-19 18:53:45 | 7 | 98e1b299-239f-488c-a7a0-3f78e76c8f6b | cc-compute06-kna1 | 396721 | 0 | 7 | NULL | | 2018-04-25 22:46:50 | 2019-05-24 07:28:59 | 8 | 64c2b0fb-4d7e-4d5f-92bc-69e00a3cb85e | cc-compute08-kna1 | 449994 | 0 | 8 | NULL | | 2018-04-26 00:47:56 | 2019-06-11 20:43:47 | 11 | 61708a8f-77fd-47dc-9140-6ea613509506 | cc-compute14-kna1 | 474210 | 0 | 11 | NULL | | 2018-04-26 00:48:01 | 2019-05-09 12:20:15 | 12 | 9e082274-568d-49a2-9801-05b2390f7dfa | cc-compute16-kna1 | 432294 | 0 | 12 | NULL | | 2018-04-26 00:48:04 | 2019-06-11 20:11:28 | 14 | 396bb173-2e46-4d35-963e-9b49acf0add8 | cc-compute22-kna1 | 448545 | 0 | 14 | NULL | | 2018-04-26 00:48:06 | 2019-05-21 13:07:23 | 15 | 80e5f3a7-e4a3-43d1-a7a8-4c118fba7792 | cc-compute12-kna1 | 450359 | 0 | 15 | NULL | | 2018-04-26 00:48:20 | 2019-05-16 14:32:54 | 18 | b86db974-5787-4012-a7df-26aeb8e73574 | cc-compute20-kna1 | 425960 | 0 | 18 | NULL | | 2018-04-26 00:48:20 | 2019-06-12 12:24:24 | 19 | dfb35aab-2af9-4d86-bccb-76959c7f68ed | cc-compute18-kna1 | 435686 | 0 | 19 | NULL | | 2018-04-26 00:48:22 | 2019-05-07 10:55:46 | 20 | 4decfcd0-cca2-4ba5-9f83-a86b8f2a8e4d | cc-compute17-kna1 | 418818 | 0 | 20 | NULL | | 2018-10-31 12:04:48 | 2019-04-24 08:32:36 | 28 | 266e5266-f811-4b24-949f-3ed9e841c479 | cc-compute10-kna1 | 166818 | NULL | 28 | NULL | | 2018-11-01 18:59:56 | 2019-06-14 06:29:52 | 34 | 5180de9c-c964-4661-bfbd-893cdfc19f32 | compute25.openstack.local | 271667 | NULL | 34 | NULL | | 2018-11-01 18:59:56 | 2019-06-14 06:29:47 | 37 | 3a456de2-68ea-4472-95dd-2db1c7b29661 | compute24.openstack.local | 283689 | NULL | 37 | NULL | | 2019-02-06 19:45:50 | 2019-06-14 06:29:39 | 43 | 0e5e6b94-2992-4075-a922-320bbe8b1bbb | compute26.openstack.local | 165203 | NULL | 43 | NULL | | 2019-02-06 19:45:50 | 2019-06-14 06:27:26 | 46 | 008c7549-b638-4130-8e79-858556a787c2 | compute27.openstack.local | 166810 | NULL | 46 | NULL | | 2019-02-10 17:45:03 | 2019-06-14 06:29:16 | 52 | 1fe21d2b-e6f1-4820-b341-a490cf9704d8 | compute29.openstack.local | 161380 | NULL | 52 | NULL | | 2019-02-10 17:45:03 | 2019-06-14 06:29:08 | 55 | e636f01c-b5da-4886-8a60-1baa5371bcc5 | compute28.openstack.local | 159388 | NULL | 55 | NULL | | 2019-04-30 09:53:45 | 2019-06-14 06:29:36 | 76 | 34381a1c-1b4e-4716-b7ba-ea72956b92f7 | compute19.openstack.local | 56127 | NULL | 76 | NULL | | 2019-04-30 13:20:12 | 2019-06-14 06:29:37 | 79 | 946fa4f1-5f1d-47be-b65c-038a7e20c42b | compute06.openstack.local | 56068 | NULL | 79 | NULL | | 2019-05-08 08:26:45 | 2019-06-14 06:30:01 | 84 | 30a5e17b-96d3-4806-849f-2d814085b130 | compute01.openstack.local | 46162 | NULL | 84 | NULL | | 2019-05-08 08:27:01 | 2019-06-14 06:29:45 | 87 | 62f85460-4244-429e-9831-357032a8f5e7 | compute17.openstack.local | 46258 | NULL | 87 | NULL | | 2019-05-13 11:37:50 | 2019-06-14 06:29:36 | 93 | 4e39206e-b00a-41d9-a2d1-a18085a576a7 | compute23.openstack.local | 31555 | NULL | 93 | NULL | | 2019-05-13 11:37:51 | 2019-06-14 06:29:46 | 96 | 6db0004d-7bcb-4758-accd-52ef580d967b | compute16.openstack.local | 40197 | NULL | 96 | NULL | | 2019-05-17 11:50:50 | 2019-06-14 06:29:38 | 102 | 18a0a9f5-c9e7-49a2-8e50-d221aec0a9f0 | compute20.openstack.local | 31563 | NULL | 102 | NULL | | 2019-05-17 11:50:50 | 2019-06-14 06:29:16 | 105 | 97a16a89-055a-4533-86e5-1285ff1911ff | compute07.openstack.local | 31495 | NULL | 105 | NULL | | 2019-05-29 11:20:15 | 2019-06-14 06:29:05 | 117 | e088c323-c8cb-4dc6-bb11-675a40cd1fcf | compute12.openstack.local | 19449 | NULL | 117 | NULL | | 2019-05-29 11:20:16 | 2019-06-14 06:29:27 | 120 | 58f85279-1103-42b6-b01d-e1c8de83b8d2 | compute08.openstack.local | 19407 | NULL | 120 | NULL | | 2019-05-29 11:20:32 | 2019-06-14 06:29:52 | 123 | 58ac9048-eca2-4f51-8d12-b6165f686cf7 | compute05.openstack.local | 19392 | NULL | 123 | NULL | | 2019-06-11 09:15:59 | 2019-06-14 06:29:29 | 126 | 882f5ad3-f20f-489f-9a20-e2654fcfa925 | compute13.openstack.local | 3873 | NULL | 126 | NULL | | 2019-06-11 09:16:23 | 2019-06-14 06:29:23 | 129 | 80e266f2-13f2-439c-b04e-736754fd27cd | compute03.openstack.local | 3823 | NULL | 129 | NULL | | 2019-06-11 09:16:24 | 2019-06-14 06:29:25 | 132 | 09ef46fa-b9e7-429b-8d5b-f4f46ead3c85 | compute11.openstack.local | 3844 | NULL | 132 | NULL | | 2019-06-12 12:31:49 | 2019-06-14 06:29:08 | 138 | ebc9a09f-08bb-4839-ab56-c4d06bcc6ed4 | vrtx01-lsd01.openstack.local | 362 | NULL | 138 | NULL | | 2019-06-12 12:32:32 | 2019-06-14 06:29:53 | 141 | d982e5bb-a7d9-40af-b667-43c2f8f2001c | vrtx01-lsd02.openstack.local | 355 | NULL | 141 | NULL | | 2019-06-13 19:42:01 | 2019-06-14 06:30:00 | 147 | ba89a743-b86f-4bb8-8cfa-3f08fc016c6a | compute15.openstack.local | 612 | NULL | 147 | NULL | | 2019-06-13 19:42:24 | 2019-06-14 06:29:44 | 150 | 68f6b408-ab9f-4fe7-be9c-7e690086f631 | compute18.openstack.local | 611 | NULL | 150 | NULL | | 2019-06-13 19:42:24 | 2019-06-14 06:29:21 | 153 | f981737a-d8f8-4b0e-8631-eedb95c85907 | compute22.openstack.local | 592 | NULL | 153 | NULL | | 2019-06-13 19:42:25 | 2019-06-14 06:29:17 | 156 | d7f483ff-3b91-4d13-9900-0ec24c3a06a4 | compute09.openstack.local | 604 | NULL | 156 | NULL | | 2019-06-13 19:42:26 | 2019-06-14 06:29:09 | 159 | bc05c643-a2db-442d-b721-39db8665f923 | compute14.openstack.local | 598 | NULL | 159 | NULL | +---------------------+---------------------+-----+--------------------------------------+------------------------------+------------+----------+------------------+--------------------+ placement returns data on both UUIDs, for example compute18 curl -g -X GET http://*****:8780/resource_providers/dfb35aab-2af9-4d86-bccb-76959c7f68ed/usages -H "Accept: application/json" -H "OpenStack-API-Version: placement 1.2" -H "User-Agent: openstacksdk/0.31.0 keystoneauth1/3.14.0 python-requests/2.22.0 CPython/2.7.12" -H "X-Auth-Token:******" | python -m json.tool % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 98 100 98 0 0 2648 0 --:--:-- --:--:-- --:--:-- 2648 { "resource_provider_generation": 435686, "usages": { "DISK_GB": 150, "MEMORY_MB": 9728, "VCPU": 7 } } curl -g -X GET http://*****:8780/resource_providers/68f6b408-ab9f-4fe7-be9c-7e690086f631/usages -H "Accept: application/json" -H "OpenStack-API-Version: placement 1.2" -H "User-Agent: openstacksdk/0.31.0 keystoneauth1/3.14.0 python-requests/2.22.0 CPython/2.7.12" -H "X-Auth-Token:*****" | python -m json.tool % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 97 100 97 0 0 740 0 --:--:-- --:--:-- --:--:-- 740 { "resource_provider_generation": 664, "usages": { "DISK_GB": 680, "MEMORY_MB": 59392, "VCPU": 32 } } i am speculating heavily on the cause of the issue however other symptoms we have seen - live migration fails as no suitable host found (despite near empty nodes) - new VMs fail to spawn as no suitable host found (despite near empty nodes) these issues lead us to have to continually live migrate VMs to get some load balancing other potentially useful input (or separate bugs) nova-compute.log often has 2019-02-07 13:37:59.362 2632 INFO nova.compute.resource_tracker [req- e0f53ec7-7668-4a64-8ba6-ead35f168e82 - - - - -] Instance 4fba72d0-2e95-4b92-b0f6-a7853dc3e8bd has allocations against this compute host but is not found in the database. we find this in normal running, but also have found it in relation to live migrations which have failed and have not been rolled back (for example as a result of the port_binding error) it is also possible to get multiple entries in the services table, though I don't believe this is related, and will be reported in a separate bug MariaDB [nova]> select host, services.binary, version from services where host="cc-compute01-kna1" -> ; +-------------------+--------------+---------+ | host | binary | version | +-------------------+--------------+---------+ | cc-compute01-kna1 | nova-compute | 35 | | cc-compute01-kna1 | nova-compute | 0 | +-------------------+--------------+---------+ ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1832814 Title: Placement API appears to have issues when compute host replaced Status in OpenStack Compute (nova): New Bug description: We have been upgrading our sites from RDO to OSA. This process involved live migrating all VMs from a compute host before reinstalling it with OSA playbooks. Note: the compute host is not "removed" from openstack in anyway; the new OSA node is the *same* hardware, same hostname etc - just reinstalled as OSA. This appears to have consequences for the way the placement API works - we have noticed that when live migrating the scheduler will often choose a highly loaded node where an empty node exists - for example - in the below output from my live migration script the VM is being migrated from cc-compute04-kna1; the scheduler has chosen cc- compute01-kna1 as the target this despite the load it currently has, and that compute09, 15 and 18 are all empty Migration Destination: cc-compute01-kna1 Migration ID: 12993 +-------------------+-----------------DEBUG------------+-----+-----------+---------+ +-------------------+----------------------------------+-----+-----------+---------+ | Host | Project | CPU | Memory MB | Disk GB | +-------------------+----------------------------------+-----+-----------+---------+ | cc-compute04-kna1 | (used_now) | 124 | 254976 | 2790 | | cc-compute01-kna1 | (used_now) | 230 | 466432 | 8210 | +-------------------+----------------------------------+-----+-----------+---------+ | cc-compute03-kna1 | (used_now) | 174 | 327680 | 4740 | | cc-compute05-kna1 | (used_now) | 198 | 457728 | 4430 | | cc-compute06-kna1 | (used_now) | 163 | 366592 | 4650 | | cc-compute07-kna1 | (used_now) | 170 | 415744 | 4460 | | cc-compute08-kna1 | (used_now) | 178 | 382464 | 4750 | | cc-compute09-kna1 | (used_now) | 0 | 2048 | 0 | | cc-compute11-kna1 | (used_now) | 131 | 313856 | 3100 | | cc-compute12-kna1 | (used_now) | 176 | 392704 | 4800 | | cc-compute13-kna1 | (used_now) | 173 | 390656 | 5470 | | cc-compute14-kna1 | (used_now) | 2 | 4096 | 50 | | cc-compute15-kna1 | (used_now) | 0 | 2048 | 0 | | cc-compute16-kna1 | (used_now) | 170 | 355840 | 5410 | | cc-compute17-kna1 | (used_now) | 281 | 646656 | 5370 | | cc-compute18-kna1 | (used_now) | 0 | 2048 | 0 | | cc-compute19-kna1 | (used_now) | 207 | 517120 | 4860 | | cc-compute20-kna1 | (used_now) | 223 | 560640 | 5150 | | cc-compute23-kna1 | (used_now) | 184 | 406528 | 6350 | | cc-compute24-kna1 | (used_now) | 190 | 585216 | 4820 | | cc-compute25-kna1 | (used_now) | 235 | 491520 | 5500 | | cc-compute26-kna1 | (used_now) | 283 | 610304 | 9390 | | cc-compute27-kna1 | (used_now) | 200 | 573440 | 6730 | | cc-compute28-kna1 | (used_now) | 269 | 587264 | 6600 | | cc-compute29-kna1 | (used_now) | 245 | 494080 | 8480 | +-------------------+----------------------------------+-----+-----------+---------+ this is not an isolated case, and is something we have seen frequently to the point where we override the scheduler and use targeted migrations to achieve better load balancing. Interrogating the Placement API for a compute (09) prior to reinstallation I can find the UUID { "generation": 480003, "links": [ { "href": "/resource_providers/d6aeeeb0-0cab-4e3f-a070-9808801b94a5", "rel": "self" }, { "href": "/resource_providers/d6aeeeb0-0cab-4e3f-a070-9808801b94a5/inventories", "rel": "inventories" }, { "href": "/resource_providers/d6aeeeb0-0cab-4e3f-a070-9808801b94a5/usages", "rel": "usages" }, { "href": "/resource_providers/d6aeeeb0-0cab-4e3f-a070-9808801b94a5/aggregates", "rel": "aggregates" } ], "name": "cc-compute09-kna1", "uuid": "d6aeeeb0-0cab-4e3f-a070-9808801b94a5" }, after the node is reinstalled it has a new UUID { "generation": 71, "links": [ { "href": "/resource_providers/d7f483ff-3b91-4d13-9900-0ec24c3a06a4", "rel": "self" }, { "href": "/resource_providers/d7f483ff-3b91-4d13-9900-0ec24c3a06a4/inventories", "rel": "inventories" }, { "href": "/resource_providers/d7f483ff-3b91-4d13-9900-0ec24c3a06a4/usages", "rel": "usages" }, { "href": "/resource_providers/d7f483ff-3b91-4d13-9900-0ec24c3a06a4/aggregates", "rel": "aggregates" } ], "name": "compute09.openstack.local", "uuid": "d7f483ff-3b91-4d13-9900-0ec24c3a06a4" }, this new resource provider shows 0 consumed resources curl -g -X GET http://********:8780/resource_providers/d7f483ff-3b91-4d13-9900-0ec24c3a06a4/usages -H "Accept: application/json" -H "OpenStack-API-Version: placement 1.2" -H "User-Agent: openstacksdk/0.31.0 keystoneauth1/3.14.0 python-requests/2.22.0 CPython/2.7.12" -H "X-Auth-Token:************" | python -m json.tool % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 89 100 89 0 0 2870 0 --:--:-- --:--:-- --:--:-- 2870 { "resource_provider_generation": 72, "usages": { "DISK_GB": 0, "MEMORY_MB": 0, "VCPU": 0 } } investigating the resource_providers table shows pottential duplicate entries - MariaDB [nova_api]> select * from resource_providers; +---------------------+---------------------+-----+--------------------------------------+------------------------------+------------+----------+------------------+--------------------+ | created_at | updated_at | id | uuid | name | generation | can_host | root_provider_id | parent_provider_id | +---------------------+---------------------+-----+--------------------------------------+------------------------------+------------+----------+------------------+--------------------+ | 2018-04-25 21:25:32 | 2019-04-17 07:08:24 | 1 | cbb2c235-ed5f-4f63-9015-1edfe91d63c8 | cc-compute02-kna1 | 195067 | 0 | 1 | NULL | | 2018-04-25 21:44:17 | 2019-05-02 13:23:34 | 2 | 6125fdeb-370f-4139-9d1c-369e9eb4e620 | cc-compute-lsd01-kna1 | 41 | 0 | 2 | NULL | | 2018-04-25 22:13:01 | 2019-05-20 13:11:55 | 3 | 452b7f99-a178-4dc7-9fea-e9d9ab6a3e99 | cc-compute05-kna1 | 450192 | 0 | 3 | NULL | | 2018-04-25 22:13:08 | 2019-06-10 12:28:41 | 4 | 03b420df-79fb-4f0a-aede-bdbd62ce9ce3 | cc-compute03-kna1 | 424867 | 0 | 4 | NULL | | 2018-04-25 22:13:08 | 2019-06-14 06:29:47 | 5 | 9386d418-339c-4010-baa5-18e2aa601a3c | cc-compute04-kna1 | 479160 | 0 | 5 | NULL | | 2018-04-25 22:46:46 | 2019-05-20 13:39:00 | 6 | 7b0580e3-7592-4c3a-a0e9-a8d23f3550d7 | cc-compute07-kna1 | 441489 | 0 | 6 | NULL | | 2018-04-25 22:46:47 | 2019-04-19 18:53:45 | 7 | 98e1b299-239f-488c-a7a0-3f78e76c8f6b | cc-compute06-kna1 | 396721 | 0 | 7 | NULL | | 2018-04-25 22:46:50 | 2019-05-24 07:28:59 | 8 | 64c2b0fb-4d7e-4d5f-92bc-69e00a3cb85e | cc-compute08-kna1 | 449994 | 0 | 8 | NULL | | 2018-04-26 00:47:56 | 2019-06-11 20:43:47 | 11 | 61708a8f-77fd-47dc-9140-6ea613509506 | cc-compute14-kna1 | 474210 | 0 | 11 | NULL | | 2018-04-26 00:48:01 | 2019-05-09 12:20:15 | 12 | 9e082274-568d-49a2-9801-05b2390f7dfa | cc-compute16-kna1 | 432294 | 0 | 12 | NULL | | 2018-04-26 00:48:04 | 2019-06-11 20:11:28 | 14 | 396bb173-2e46-4d35-963e-9b49acf0add8 | cc-compute22-kna1 | 448545 | 0 | 14 | NULL | | 2018-04-26 00:48:06 | 2019-05-21 13:07:23 | 15 | 80e5f3a7-e4a3-43d1-a7a8-4c118fba7792 | cc-compute12-kna1 | 450359 | 0 | 15 | NULL | | 2018-04-26 00:48:20 | 2019-05-16 14:32:54 | 18 | b86db974-5787-4012-a7df-26aeb8e73574 | cc-compute20-kna1 | 425960 | 0 | 18 | NULL | | 2018-04-26 00:48:20 | 2019-06-12 12:24:24 | 19 | dfb35aab-2af9-4d86-bccb-76959c7f68ed | cc-compute18-kna1 | 435686 | 0 | 19 | NULL | | 2018-04-26 00:48:22 | 2019-05-07 10:55:46 | 20 | 4decfcd0-cca2-4ba5-9f83-a86b8f2a8e4d | cc-compute17-kna1 | 418818 | 0 | 20 | NULL | | 2018-10-31 12:04:48 | 2019-04-24 08:32:36 | 28 | 266e5266-f811-4b24-949f-3ed9e841c479 | cc-compute10-kna1 | 166818 | NULL | 28 | NULL | | 2018-11-01 18:59:56 | 2019-06-14 06:29:52 | 34 | 5180de9c-c964-4661-bfbd-893cdfc19f32 | compute25.openstack.local | 271667 | NULL | 34 | NULL | | 2018-11-01 18:59:56 | 2019-06-14 06:29:47 | 37 | 3a456de2-68ea-4472-95dd-2db1c7b29661 | compute24.openstack.local | 283689 | NULL | 37 | NULL | | 2019-02-06 19:45:50 | 2019-06-14 06:29:39 | 43 | 0e5e6b94-2992-4075-a922-320bbe8b1bbb | compute26.openstack.local | 165203 | NULL | 43 | NULL | | 2019-02-06 19:45:50 | 2019-06-14 06:27:26 | 46 | 008c7549-b638-4130-8e79-858556a787c2 | compute27.openstack.local | 166810 | NULL | 46 | NULL | | 2019-02-10 17:45:03 | 2019-06-14 06:29:16 | 52 | 1fe21d2b-e6f1-4820-b341-a490cf9704d8 | compute29.openstack.local | 161380 | NULL | 52 | NULL | | 2019-02-10 17:45:03 | 2019-06-14 06:29:08 | 55 | e636f01c-b5da-4886-8a60-1baa5371bcc5 | compute28.openstack.local | 159388 | NULL | 55 | NULL | | 2019-04-30 09:53:45 | 2019-06-14 06:29:36 | 76 | 34381a1c-1b4e-4716-b7ba-ea72956b92f7 | compute19.openstack.local | 56127 | NULL | 76 | NULL | | 2019-04-30 13:20:12 | 2019-06-14 06:29:37 | 79 | 946fa4f1-5f1d-47be-b65c-038a7e20c42b | compute06.openstack.local | 56068 | NULL | 79 | NULL | | 2019-05-08 08:26:45 | 2019-06-14 06:30:01 | 84 | 30a5e17b-96d3-4806-849f-2d814085b130 | compute01.openstack.local | 46162 | NULL | 84 | NULL | | 2019-05-08 08:27:01 | 2019-06-14 06:29:45 | 87 | 62f85460-4244-429e-9831-357032a8f5e7 | compute17.openstack.local | 46258 | NULL | 87 | NULL | | 2019-05-13 11:37:50 | 2019-06-14 06:29:36 | 93 | 4e39206e-b00a-41d9-a2d1-a18085a576a7 | compute23.openstack.local | 31555 | NULL | 93 | NULL | | 2019-05-13 11:37:51 | 2019-06-14 06:29:46 | 96 | 6db0004d-7bcb-4758-accd-52ef580d967b | compute16.openstack.local | 40197 | NULL | 96 | NULL | | 2019-05-17 11:50:50 | 2019-06-14 06:29:38 | 102 | 18a0a9f5-c9e7-49a2-8e50-d221aec0a9f0 | compute20.openstack.local | 31563 | NULL | 102 | NULL | | 2019-05-17 11:50:50 | 2019-06-14 06:29:16 | 105 | 97a16a89-055a-4533-86e5-1285ff1911ff | compute07.openstack.local | 31495 | NULL | 105 | NULL | | 2019-05-29 11:20:15 | 2019-06-14 06:29:05 | 117 | e088c323-c8cb-4dc6-bb11-675a40cd1fcf | compute12.openstack.local | 19449 | NULL | 117 | NULL | | 2019-05-29 11:20:16 | 2019-06-14 06:29:27 | 120 | 58f85279-1103-42b6-b01d-e1c8de83b8d2 | compute08.openstack.local | 19407 | NULL | 120 | NULL | | 2019-05-29 11:20:32 | 2019-06-14 06:29:52 | 123 | 58ac9048-eca2-4f51-8d12-b6165f686cf7 | compute05.openstack.local | 19392 | NULL | 123 | NULL | | 2019-06-11 09:15:59 | 2019-06-14 06:29:29 | 126 | 882f5ad3-f20f-489f-9a20-e2654fcfa925 | compute13.openstack.local | 3873 | NULL | 126 | NULL | | 2019-06-11 09:16:23 | 2019-06-14 06:29:23 | 129 | 80e266f2-13f2-439c-b04e-736754fd27cd | compute03.openstack.local | 3823 | NULL | 129 | NULL | | 2019-06-11 09:16:24 | 2019-06-14 06:29:25 | 132 | 09ef46fa-b9e7-429b-8d5b-f4f46ead3c85 | compute11.openstack.local | 3844 | NULL | 132 | NULL | | 2019-06-12 12:31:49 | 2019-06-14 06:29:08 | 138 | ebc9a09f-08bb-4839-ab56-c4d06bcc6ed4 | vrtx01-lsd01.openstack.local | 362 | NULL | 138 | NULL | | 2019-06-12 12:32:32 | 2019-06-14 06:29:53 | 141 | d982e5bb-a7d9-40af-b667-43c2f8f2001c | vrtx01-lsd02.openstack.local | 355 | NULL | 141 | NULL | | 2019-06-13 19:42:01 | 2019-06-14 06:30:00 | 147 | ba89a743-b86f-4bb8-8cfa-3f08fc016c6a | compute15.openstack.local | 612 | NULL | 147 | NULL | | 2019-06-13 19:42:24 | 2019-06-14 06:29:44 | 150 | 68f6b408-ab9f-4fe7-be9c-7e690086f631 | compute18.openstack.local | 611 | NULL | 150 | NULL | | 2019-06-13 19:42:24 | 2019-06-14 06:29:21 | 153 | f981737a-d8f8-4b0e-8631-eedb95c85907 | compute22.openstack.local | 592 | NULL | 153 | NULL | | 2019-06-13 19:42:25 | 2019-06-14 06:29:17 | 156 | d7f483ff-3b91-4d13-9900-0ec24c3a06a4 | compute09.openstack.local | 604 | NULL | 156 | NULL | | 2019-06-13 19:42:26 | 2019-06-14 06:29:09 | 159 | bc05c643-a2db-442d-b721-39db8665f923 | compute14.openstack.local | 598 | NULL | 159 | NULL | +---------------------+---------------------+-----+--------------------------------------+------------------------------+------------+----------+------------------+--------------------+ placement returns data on both UUIDs, for example compute18 curl -g -X GET http://*****:8780/resource_providers/dfb35aab-2af9-4d86-bccb-76959c7f68ed/usages -H "Accept: application/json" -H "OpenStack-API-Version: placement 1.2" -H "User-Agent: openstacksdk/0.31.0 keystoneauth1/3.14.0 python-requests/2.22.0 CPython/2.7.12" -H "X-Auth-Token:******" | python -m json.tool % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 98 100 98 0 0 2648 0 --:--:-- --:--:-- --:--:-- 2648 { "resource_provider_generation": 435686, "usages": { "DISK_GB": 150, "MEMORY_MB": 9728, "VCPU": 7 } } curl -g -X GET http://*****:8780/resource_providers/68f6b408-ab9f-4fe7-be9c-7e690086f631/usages -H "Accept: application/json" -H "OpenStack-API-Version: placement 1.2" -H "User-Agent: openstacksdk/0.31.0 keystoneauth1/3.14.0 python-requests/2.22.0 CPython/2.7.12" -H "X-Auth-Token:*****" | python -m json.tool % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 97 100 97 0 0 740 0 --:--:-- --:--:-- --:--:-- 740 { "resource_provider_generation": 664, "usages": { "DISK_GB": 680, "MEMORY_MB": 59392, "VCPU": 32 } } i am speculating heavily on the cause of the issue however other symptoms we have seen - live migration fails as no suitable host found (despite near empty nodes) - new VMs fail to spawn as no suitable host found (despite near empty nodes) these issues lead us to have to continually live migrate VMs to get some load balancing other potentially useful input (or separate bugs) nova-compute.log often has 2019-02-07 13:37:59.362 2632 INFO nova.compute.resource_tracker [req- e0f53ec7-7668-4a64-8ba6-ead35f168e82 - - - - -] Instance 4fba72d0-2e95-4b92-b0f6-a7853dc3e8bd has allocations against this compute host but is not found in the database. we find this in normal running, but also have found it in relation to live migrations which have failed and have not been rolled back (for example as a result of the port_binding error) it is also possible to get multiple entries in the services table, though I don't believe this is related, and will be reported in a separate bug MariaDB [nova]> select host, services.binary, version from services where host="cc-compute01-kna1" -> ; +-------------------+--------------+---------+ | host | binary | version | +-------------------+--------------+---------+ | cc-compute01-kna1 | nova-compute | 35 | | cc-compute01-kna1 | nova-compute | 0 | +-------------------+--------------+---------+ To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1832814/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp