Reviewed: https://review.opendev.org/714998 Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=738110db7492b1360f5f197e8ecafd69a3b141b4 Submitter: Zuul Branch: master
commit 738110db7492b1360f5f197e8ecafd69a3b141b4 Author: Balazs Gibizer <balazs.gibi...@est.tech> Date: Wed Mar 25 17:48:23 2020 +0100 Update scheduler instance info at confirm resize When a resize is confirmed the instance does not belong to the source compute any more. In the past the scheduler instance info is only updated by the _sync_scheduler_instance_info periodic. This caused that server boots with anti-affinity did not consider the source host. But now at the end of the confirm_resize call the compute also updates the scheduler about the move. Change-Id: Ic50e72e289b56ac54720ad0b719ceeb32487b8c8 Closes-Bug: #1869050 ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1869050 Title: migration of anti-affinity server fails due to stale scheduler instance info Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) pike series: Triaged Status in OpenStack Compute (nova) queens series: Triaged Status in OpenStack Compute (nova) rocky series: Triaged Status in OpenStack Compute (nova) stein series: Triaged Status in OpenStack Compute (nova) train series: Triaged Bug description: Description =========== Steps to reproduce ================== Have a deployment with 3 compute nodes * make sure that the deployment is configured with tracks_instance_changes=True (True is the default) * create and server group with anti-affinity policy * boot server1 into the group * boot server2 into the group * migrate server2 * confirm the migration * boot server3 Make sure that between the last two steps there was no periodic _sync_scheduler_instance_info running on the compute that was hosted server2 before the migration. This could done by doing the last too steps after each other without waiting too much as interval of that periodic (scheduler_instance_sync_interval) is defaulted to 120 sec. Expected result =============== server3 is booted on the host where server2 is moved away Actual result ============= server3 cannot be booted (NoValidHost) Triage ====== The confirm resize call on the source compute does not update the scheduler that the instance is removed from this host. This makes the scheduler instance info stale and causing the subsequent scheduling error. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1869050/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp