Public bug reported: Description =========== Isolated to single hypervisor. Nova scheduler randomly fails to schedule CPU-pinned instance-flavors with hugepages - fails increases as running instance count grows.
Steps to reproduce ================== 1) Hypervisor with two numa-nodes, 2x Intel Gold 6126, 256GB RAM (128GB in each numa node), 61440x2M hugepages in each node. Hypervisor running nothing else than OpenStack 2) Flavor specified with: - 4 vCPUs - 20480 MB RAM - hw:cpu_policy dedicated - hw:cpu_thread_policy require - hw:mem_page_size 2MB 3) Try to schedule 12 instances of the mentioned flavor Expected result =============== 12 instances running on hypervisor, neatly packed using up all hugepages. Actual result ============= NUMA node 0 is full, NUMA node 1 has 2-3 instances or so. This varies from attempt to attempt. Workaround ========== Leave all running instances as they are, schedule more instances until the desired amount of instances have been successfully created. (It took 32 create attempts to fill all 12 slots for me) Problem will not exist if hugepages are disabled from flavor and hypervisor. Environment =========== Running OpenStack Ocata, RDO packages on Centos 7.4. Linux 3.10.0-514.10.2.el7.x86_64 nova 15.0.7 Compute: openstack-nova-compute-15.0.7-1.el7.noarch Ctrl: openstack-nova-conductor-15.0.7-1.el7.noarch python2-novaclient-7.1.2-1.el7.noarch python-nova-15.0.7-1.el7.noarch openstack-nova-novncproxy-15.0.7-1.el7.noarch openstack-nova-placement-api-15.0.7-1.el7.noarch openstack-nova-common-15.0.7-1.el7.noarch openstack-nova-api-15.0.7-1.el7.noarch openstack-nova-scheduler-15.0.7-1.el7.noarch openstack-nova-console-15.0.7-1.el7.noarch Using Libvirt+KVM libvirt 3.2.0-14.el7_4 (ev) qemu 2.9.0-16.el7_4 (ev) Storage is pure qcow2 on /var/lib/nova Neutron with linuxbridge-agent for networking. ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1738501 Title: Nova scheduler randomly fails to schedule CPU-pinned instance-flavors with hugepages - fails increases as running instance count grows Status in OpenStack Compute (nova): New Bug description: Description =========== Isolated to single hypervisor. Nova scheduler randomly fails to schedule CPU-pinned instance-flavors with hugepages - fails increases as running instance count grows. Steps to reproduce ================== 1) Hypervisor with two numa-nodes, 2x Intel Gold 6126, 256GB RAM (128GB in each numa node), 61440x2M hugepages in each node. Hypervisor running nothing else than OpenStack 2) Flavor specified with: - 4 vCPUs - 20480 MB RAM - hw:cpu_policy dedicated - hw:cpu_thread_policy require - hw:mem_page_size 2MB 3) Try to schedule 12 instances of the mentioned flavor Expected result =============== 12 instances running on hypervisor, neatly packed using up all hugepages. Actual result ============= NUMA node 0 is full, NUMA node 1 has 2-3 instances or so. This varies from attempt to attempt. Workaround ========== Leave all running instances as they are, schedule more instances until the desired amount of instances have been successfully created. (It took 32 create attempts to fill all 12 slots for me) Problem will not exist if hugepages are disabled from flavor and hypervisor. Environment =========== Running OpenStack Ocata, RDO packages on Centos 7.4. Linux 3.10.0-514.10.2.el7.x86_64 nova 15.0.7 Compute: openstack-nova-compute-15.0.7-1.el7.noarch Ctrl: openstack-nova-conductor-15.0.7-1.el7.noarch python2-novaclient-7.1.2-1.el7.noarch python-nova-15.0.7-1.el7.noarch openstack-nova-novncproxy-15.0.7-1.el7.noarch openstack-nova-placement-api-15.0.7-1.el7.noarch openstack-nova-common-15.0.7-1.el7.noarch openstack-nova-api-15.0.7-1.el7.noarch openstack-nova-scheduler-15.0.7-1.el7.noarch openstack-nova-console-15.0.7-1.el7.noarch Using Libvirt+KVM libvirt 3.2.0-14.el7_4 (ev) qemu 2.9.0-16.el7_4 (ev) Storage is pure qcow2 on /var/lib/nova Neutron with linuxbridge-agent for networking. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1738501/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp