Gerrit is down but I've written a functional regression test to recreate the bug and attached it as a patch for now.
** Description changed: - This is purely based on code inspection right now, I need to write a - functional test to recreate the issue. - While triaging bug 1821594 it got me thinking about handling placement allocations during resize when something fails, which got me thinking about an older fix: https://review.openstack.org/#/c/543971/6/nova/compute/manager.py@4457 Looking back on that now, I think the revert during resize_instance is OK as long as the instance host/node has not changed, but I think doing it when finish_resize fails was probably a mistake because the instance.host in the nova db won't match where the allocations exist in placement. Before Pike this was fine since the ResourceTracker would heal the allocations in the update_available_resource periodic task, but we don't have that anymore. So this could result in an instance reported as being on the dest host in the nova database with the new flavor, which is where it will get rebuilt/rebooted/etc, but placement will be tracking the instance resource allocations using the old flavor against the source host, which is not where the instance is. Furthermore, if finish_resize fails, the instance should be in ERROR status and the user would likely try to hard reboot the instance to correct that status, which would happen on the dest host. ** Patch added: "0001-Add-functional-recreate-test-for-regression-bug-1825.patch" https://bugs.launchpad.net/nova/+bug/1825537/+attachment/5257070/+files/0001-Add-functional-recreate-test-for-regression-bug-1825.patch ** Changed in: nova Status: New => Triaged ** Changed in: nova Importance: Undecided => Medium ** Also affects: nova/pike Importance: Undecided Status: New ** Also affects: nova/stein Importance: Undecided Status: New ** Also affects: nova/queens Importance: Undecided Status: New ** Also affects: nova/rocky Importance: Undecided Status: New ** Changed in: nova/pike Status: New => Confirmed ** Changed in: nova/queens Status: New => Confirmed ** Changed in: nova/stein Status: New => Confirmed ** Changed in: nova/rocky Status: New => Confirmed ** Changed in: nova/queens Importance: Undecided => Medium ** Changed in: nova/stein Importance: Undecided => Medium ** Changed in: nova/rocky Importance: Undecided => Medium ** Changed in: nova/pike Importance: Undecided => Medium -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1825537 Title: finish_resize failures incorrectly revert allocations Status in OpenStack Compute (nova): Triaged Status in OpenStack Compute (nova) pike series: Confirmed Status in OpenStack Compute (nova) queens series: Confirmed Status in OpenStack Compute (nova) rocky series: Confirmed Status in OpenStack Compute (nova) stein series: Confirmed Bug description: While triaging bug 1821594 it got me thinking about handling placement allocations during resize when something fails, which got me thinking about an older fix: https://review.openstack.org/#/c/543971/6/nova/compute/manager.py@4457 Looking back on that now, I think the revert during resize_instance is OK as long as the instance host/node has not changed, but I think doing it when finish_resize fails was probably a mistake because the instance.host in the nova db won't match where the allocations exist in placement. Before Pike this was fine since the ResourceTracker would heal the allocations in the update_available_resource periodic task, but we don't have that anymore. So this could result in an instance reported as being on the dest host in the nova database with the new flavor, which is where it will get rebuilt/rebooted/etc, but placement will be tracking the instance resource allocations using the old flavor against the source host, which is not where the instance is. Furthermore, if finish_resize fails, the instance should be in ERROR status and the user would likely try to hard reboot the instance to correct that status, which would happen on the dest host. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1825537/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp