2018-05-18 19:58 GMT+08:00 Nadathur, Sundar <sundar.nadat...@intel.com>:
> Hi Matt, > On 5/17/2018 3:18 PM, Matt Riedemann wrote: > > On 5/17/2018 3:36 PM, Nadathur, Sundar wrote: > > This applies only to the resources that Nova handles, IIUC, which does not > handle accelerators. The generic method that Alex talks about is obviously > preferable but, if that is not available in Rocky, is the filter an option? > > > If nova isn't creating accelerator resources managed by cyborg, I have no > idea why nova would be doing quota checks on those types of resources. And > no, I don't think adding a scheduler filter to nova for checking > accelerator quota is something we'd add either. I'm not sure that would > even make sense - the quota for the resource is per tenant, not per host is > it? The scheduler filters work on a per-host basis. > > Can we not extend BaseFilter.filter_all() to get all the hosts in a > filter? > https://github.com/openstack/nova/blob/master/nova/filters. > py#L36 > > I should have made it clearer that this putative filter will be > out-of-tree, and needed only till better solutions become available. > > > Like any other resource in openstack, the project that manages that > resource should be in charge of enforcing quota limits for it. > > Agreed. Not sure how other projects handle it, but here's the situation > for Cyborg. A request may get scheduled on a compute node with no > intervention by Cyborg. So, the earliest check that can be made today is in > the selected compute node. A simple approach can result in quota violations > as in this example. > > Say there are 5 devices in a cluster. A tenant has a quota of 4 and is > currently using 3. That leaves 2 unused devices, of which the tenant is > permitted to use only one. But he may submit two concurrent requests, and > they may land on two different compute nodes. The Cyborg agent in each node > will see the current tenant usage as 3 and let the request go through, > resulting in quota violation. > > That's a bed design if Cyborg agent in each node let the request go through. And the current Cyborg quota design does not have this issue. > To prevent this, we need some kind of atomic update , like SQLAlchemy's > with_lockmode(): > https://wiki.openstack.org/wiki/OpenStack_and_SQLAlchemy# > Pessimistic_Locking_-_SELECT_FOR_UPDATE > That seems to have issues, as documented in the link above. Also, since > every compute node does that, it would also serialize the bringup of all > instances with accelerators, across the cluster. > > If there is a better solution, I'll be happy to hear it. > > Thanks, > Sundar > > > > > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > >
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev