Public bug reported: Description ===========
The temptest test: test_create_server_with_scheduler_hint_group_affinity fails in Openstack Yoga but passes with Openstack Victoria. The test is run on the same hardware with the same configuration. ----- Relevant info: 1: cpu pinning is enabled via vcpu_pin_set in nova.conf 2: the property hw:cpu_policy=dedicated is set in the flavor This configuration has literally been working for years. There seems to be a race type situation where both claims are made before the cpu free list is updated. ----- Relevant logs: CPU 64 in the list of usable CPUs 2023-06-09 21:26:01.223 858862 INFO nova.virt.hardware [-] Computed NUMA topology CPU pinning: usable pCPUs: [[64, 20], [8, 52], [40, 84], [82, 38], [32, 76], [18, 62], [74, 30], [56, 12], [10, 54], [24, 68], [80, 36], [42, 86], [66, 22], [72, 28], [34, 78], [58, 14], [16, 60], [26, 70]], vCPUs mapping: [(0, 64)] The first claim is made: 2023-06-09 21:26:01.223 858862 INFO nova.compute.claims [-] [instance: ecc5bf99-9583-4acd-b075-19535e380c67] Claim successful on node foo.example.com CPU 64 is still available: 2023-06-09 21:26:01.261 858862 INFO nova.virt.hardware [-] Computed NUMA topology CPU pinning: usable pCPUs: [[64, 20], [8, 52], [40, 84], [82, 38], [32, 76], [18, 62], [74, 30], [56, 12], [10, 54], [24, 68], [80, 36], [42, 86], [66, 22], [72, 28], [34, 78], [58, 14], [16, 60], [26, 70]], vCPUs mapping: [(0, 64)] The second claim is made: 2023-06-09 21:26:01.262 858862 INFO nova.compute.claims [-] [instance: f65fe4dd-5733-4a9d-be71-32f79e514906] Claim successful on node foo.example.com The error is now seen: 2023-06-09 21:26:01.351 858862 ERROR nova.compute.manager [-] [instance: f65fe4dd-5733-4a9d-be71-32f79e514906] Failed to build and run instance: nova.exception.CPUPinningInvalid: CPU set to pin [64] must be a subset of free CPU set [8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 52, 54, 56, 58, 60, 62, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86] Additional error: ERROR state.: nova.exception.MaxRetriesExceeded: Exceeded maximum number of retries. Exhausted all hosts available for retrying build failures for instance... Steps to reproduce ================== Enable CPU pinning with Openstack Nova and run the tempest test: test_create_server_with_scheduler_hint_group_affinity It fails every time for me. Expected result =============== Test passes Actual result ============= Test fails Environment =========== Nova version: 25.0.2 ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/2023693 Title: Tempest failure due to possible affinity group race and cpu pinning Status in OpenStack Compute (nova): New Bug description: Description =========== The temptest test: test_create_server_with_scheduler_hint_group_affinity fails in Openstack Yoga but passes with Openstack Victoria. The test is run on the same hardware with the same configuration. ----- Relevant info: 1: cpu pinning is enabled via vcpu_pin_set in nova.conf 2: the property hw:cpu_policy=dedicated is set in the flavor This configuration has literally been working for years. There seems to be a race type situation where both claims are made before the cpu free list is updated. ----- Relevant logs: CPU 64 in the list of usable CPUs 2023-06-09 21:26:01.223 858862 INFO nova.virt.hardware [-] Computed NUMA topology CPU pinning: usable pCPUs: [[64, 20], [8, 52], [40, 84], [82, 38], [32, 76], [18, 62], [74, 30], [56, 12], [10, 54], [24, 68], [80, 36], [42, 86], [66, 22], [72, 28], [34, 78], [58, 14], [16, 60], [26, 70]], vCPUs mapping: [(0, 64)] The first claim is made: 2023-06-09 21:26:01.223 858862 INFO nova.compute.claims [-] [instance: ecc5bf99-9583-4acd-b075-19535e380c67] Claim successful on node foo.example.com CPU 64 is still available: 2023-06-09 21:26:01.261 858862 INFO nova.virt.hardware [-] Computed NUMA topology CPU pinning: usable pCPUs: [[64, 20], [8, 52], [40, 84], [82, 38], [32, 76], [18, 62], [74, 30], [56, 12], [10, 54], [24, 68], [80, 36], [42, 86], [66, 22], [72, 28], [34, 78], [58, 14], [16, 60], [26, 70]], vCPUs mapping: [(0, 64)] The second claim is made: 2023-06-09 21:26:01.262 858862 INFO nova.compute.claims [-] [instance: f65fe4dd-5733-4a9d-be71-32f79e514906] Claim successful on node foo.example.com The error is now seen: 2023-06-09 21:26:01.351 858862 ERROR nova.compute.manager [-] [instance: f65fe4dd-5733-4a9d-be71-32f79e514906] Failed to build and run instance: nova.exception.CPUPinningInvalid: CPU set to pin [64] must be a subset of free CPU set [8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 52, 54, 56, 58, 60, 62, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86] Additional error: ERROR state.: nova.exception.MaxRetriesExceeded: Exceeded maximum number of retries. Exhausted all hosts available for retrying build failures for instance... Steps to reproduce ================== Enable CPU pinning with Openstack Nova and run the tempest test: test_create_server_with_scheduler_hint_group_affinity It fails every time for me. Expected result =============== Test passes Actual result ============= Test fails Environment =========== Nova version: 25.0.2 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/2023693/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp