Sean, Thank you for the detailed explanation, i really hope if we can backport to queens, it would be harder for me to upgrade cluster..!
On Tue, Nov 13, 2018 at 8:42 AM Sean Mooney <smoo...@redhat.com> wrote: > > On Tue, 2018-11-13 at 07:52 -0500, Satish Patel wrote: > > Mike, > > > > Here is the bug which I reported https://bugs.launchpad.net/bugs/1795920 > actully this is a releated but different bug based in the description below. > thanks for highlighting this to me. > > > > Cc'ing: Sean > > > > Sent from my iPhone > > > > On Nov 12, 2018, at 8:27 AM, Satish Patel <satish....@gmail.com> wrote: > > > > > Mike, > > > > > > I had same issue month ago when I roll out sriov in my cloud and this is > > > what I did to solve this issue. Set > > > following in flavor > > > > > > hw:numa_nodes=2 > > > > > > It will spread out instance vcpu across numa, yes there will be little > > > penalty but if you tune your application > > > according they you are good > > > > > > Yes this is bug I have already open ticket and I believe folks are > > > working on it but its not simple fix. They may > > > release new feature in coming oprnstack release. > > > > > > Sent from my iPhone > > > > > > On Nov 11, 2018, at 9:25 PM, Mike Joseph <m...@mode.net> wrote: > > > > > > > Hi folks, > > > > > > > > It appears that the numa_policy attribute of a PCI alias is ignored for > > > > flavors referencing that alias if the > > > > flavor also has hw:cpu_policy=dedicated set. The alias config is: > > > > > > > > alias = { "name": "mlx", "device_type": "type-VF", "vendor_id": "15b3", > > > > "product_id": "1004", "numa_policy": > > > > "preferred" } > > > > > > > > And the flavor config is: > > > > > > > > { > > > > "OS-FLV-DISABLED:disabled": false, > > > > "OS-FLV-EXT-DATA:ephemeral": 0, > > > > "access_project_ids": null, > > > > "disk": 10, > > > > "id": "221e1bcd-2dde-48e6-bd09-820012198908", > > > > "name": "vm-2", > > > > "os-flavor-access:is_public": true, > > > > "properties": "hw:cpu_policy='dedicated', > > > > pci_passthrough:alias='mlx:1'", > > > > "ram": 8192, > > > > "rxtx_factor": 1.0, > > > > "swap": "", > > > > "vcpus": 2 > > > > } > Satish in your case you were trying to use neutrons sriov vnic types such > that the VF would be connected to a neutron > network. In this case the mellanox connectx 3 virtual funcitons are being > passed to the guest using the pci alias via > the flavor which means they cannot be used to connect to neutron networks but > they should be able to use affinity > poileices. > > > > > > > > In short, our compute nodes have an SR-IOV Mellanox NIC (ConnectX-3) > > > > with 16 VFs configured. We wish to expose > > > > these VFs to VMs that schedule on the host. However, the NIC is in > > > > NUMA region 0 which means that only half of > > > > the compute node's CPU cores would be usable if we required VM affinity > > > > to the NIC's NUMA region. But we don't > > > > need that, since we are okay with cross-region access to the PCI device. > > > > > > > > However, we do need CPU pinning to work, in order to have efficient > > > > cache hits on our VM processes. Therefore, we > > > > still want to pin our vCPUs to pCPUs, even if the pins end up on on a > > > > NUMA region opposite of the NIC. The spec > > > > for numa_policy seem to indicate that this is exactly the intent of the > > > > option: > > > > > > > > https://specs.openstack.org/openstack/nova-specs/specs/queens/implemented/share-pci-between-numa-nodes.html > > > > > > > > But, with the above config, we still get PCI affinity scheduling errors: > > > > > > > > 'Insufficient compute resources: Requested instance NUMA topology > > > > together with requested PCI devices cannot fit > > > > the given host NUMA topology.' > > > > > > > > This strikes me as a bug, but perhaps I am missing something here? > yes this does infact seam like a new bug. > can you add myself and stephen to the bug once you file it. > in the bug please include the version of opentack you were deploying. > > in the interim setting hw:numa_nodes=2 will allow you to pin the guest > without the error > however the flavor and alias you have provided should have been enough. > > im hoping that we can fix both the alisa and neutorn based case this cycle > but to do so we > will need to reporpose original queens spec for stein and disucss if we can > backport any of the > fixes or if this would be only completed in stein+ i would hope we coudl > backport fixes for the flavor > based use case but the neutron based sue case would likely be stein+ > > regards > sean > > > > > > > > Thanks, > > > > MJ > > > > _______________________________________________ > > > > Mailing list: > > > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack > > > > Post to : openstack@lists.openstack.org > > > > Unsubscribe : > > > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack _______________________________________________ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack