Public bug reported: Nova has a race condition between resize_instance() compute manager call and the update_available_resources periodic job. If they overlap at the right place, when resize_instance calls finish_resize, then periodic job will not track the migration nor the instance on the source host. It causes that the PCPU allocation on the source host is dropped in the resource tracker (not in placement). Then when the resize is confirmed nova tries to free the pinned cpus again on the source host and fails with CPUUnpinningInvalid as they are already freed.
I've pushed a reproduction test: https://review.opendev.org/c/openstack/nova/+/810763 It is reproducible at least on master, xena, wallaby, and victoria ** Affects: nova Importance: Medium Assignee: Balazs Gibizer (balazs-gibizer) Status: New ** Tags: compute numa race-condition resize ** Changed in: nova Assignee: (unassigned) => Balazs Gibizer (balazs-gibizer) ** Changed in: nova Importance: Undecided => Medium ** Description changed: Nova has a race condition between resize_instance() compute manager call and the update_available_resources periodic job. If they overlap at the right place, when resize_instance calls finish_resize, then periodic job will not track the migration nor the instance on the source host. It causes that the PCPU allocation on the source host is dropped in the resource tracker (not in placement). Then when the resize is confirmed nova tries to free the pinned cpus again on the source host and fails with CPUUnpinningInvalid as they are already freed. I will push a reproduction test soon. + + It is reproducible at least on master, xena, wallaby, and victoria ** Tags added: compute numa race-condition resize ** Description changed: Nova has a race condition between resize_instance() compute manager call and the update_available_resources periodic job. If they overlap at the right place, when resize_instance calls finish_resize, then periodic job will not track the migration nor the instance on the source host. It causes that the PCPU allocation on the source host is dropped in the resource tracker (not in placement). Then when the resize is confirmed nova tries to free the pinned cpus again on the source host and fails with CPUUnpinningInvalid as they are already freed. - I will push a reproduction test soon. + I've pushed a reproduction test: + https://review.opendev.org/c/openstack/nova/+/810763 It is reproducible at least on master, xena, wallaby, and victoria -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1944759 Title: confirm resize fails with CPUUnpinningInvalid Status in OpenStack Compute (nova): New Bug description: Nova has a race condition between resize_instance() compute manager call and the update_available_resources periodic job. If they overlap at the right place, when resize_instance calls finish_resize, then periodic job will not track the migration nor the instance on the source host. It causes that the PCPU allocation on the source host is dropped in the resource tracker (not in placement). Then when the resize is confirmed nova tries to free the pinned cpus again on the source host and fails with CPUUnpinningInvalid as they are already freed. I've pushed a reproduction test: https://review.opendev.org/c/openstack/nova/+/810763 It is reproducible at least on master, xena, wallaby, and victoria To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1944759/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

