On 05/18/2018 07:58 AM, Nadathur, Sundar wrote:
Agreed. Not sure how other projects handle it, but here's the situation for Cyborg. A request may get scheduled on a compute node with no intervention by Cyborg. So, the earliest check that can be made today is in the selected compute node. A simple approach can result in quota violations as in this example.

    Say there are 5 devices in a cluster. A tenant has a quota of 4 and
    is currently using 3. That leaves 2 unused devices, of which the
    tenant is permitted to use only one. But he may submit two
    concurrent requests, and they may land on two different compute
    nodes. The Cyborg agent in each node will see the current tenant
    usage as 3 and let the request go through, resulting in quota violation.
>
To prevent this, we need some kind of atomic update , like SQLAlchemy's with_lockmode(): https://wiki.openstack.org/wiki/OpenStack_and_SQLAlchemy#Pessimistic_Locking_-_SELECT_FOR_UPDATE That seems to have issues, as documented in the link above. Also, since every compute node does that, it would also serialize the bringup of all instances with accelerators, across the cluster.
>
If there is a better solution, I'll be happy to hear it.

The solution is to implement the following two specs:

https://review.openstack.org/#/c/509042/
https://review.openstack.org/#/c/569011/

The problem of consuming more resources than a user/project has quota for is not a new problem. Users have been able to go over their quota in all of the services for as long as I can remember -- they can do this by essentially DDoS'ing the API with lots of concurrent single-instance build requests [1] all at once. The tenant then ends up in an over-quota situation and is essentially unable to do anything at all before deleting resources.

The only operators that I can remember that complained about this issue were the public cloud operators -- and rightfully so since quota abuse in public clouds meant their reputation for fairness might be questioned. Most operators I know of solved this problem by addressing *rate-limiting*, which is not the same as quota limits. By rate-limiting requests to the APIs, the operators were able to alleviate the problem by addressing a symptom, which was that high rates of concurrent requests could lead to over-quota situations.

Nobody is using Cyborg separately from Nova at the moment (or ever?). It's not as if a user will be consuming an accelerator outside of a Nova instance -- since it is the Nova instance that is the workload that uses the accelerator.

That means that Cyborg resources should be treated as just another resource class whose usage should be checked in a single query to the /usages placement API endpoint before attempting to spawn the instance (again, via Nova) that ends up consuming those resources.

The claiming of all resources that are consumed by a Nova instance (which would include any accelerator resources) is an atomic operation that prevents over-allocation of any provider involved in the claim transaction. [2]

This atomic operation in Nova/Placement *significantly* cuts down on the chances of a user/project exceeding its quota because it reduces the amount of time to get an accurate read of the resource usage to a very small amount of time (from seconds/tens of seconds to milliseconds).

So, to sum up, my recommendation is to get involved in the two Nova specs above and help to see them to completion in Rocky. Doing so will free Cyborg developers up to focus on integration with the virt driver layer via the os-acc library, implementing the update_provider_tree() interface, and coming up with some standard resource classes for describing accelerated resources.

Best,
-jay

[1] I'm explicitly calling out multiple concurrent single build requests here, since a build request for multiple instances is actually not a cause of over-quota because the entire set of requested instances is considered as a single unit for usage calculation.

[2] technically, NUMA topology resources and PCI devices do not currently participate in this single claim transaction. This is not ideal, and is something we are actively working on addressing. Keep in mind there are also no quota classes for PCI devices or NUMA topologies, though, so the over-quota problems don't exist for those resource classes.

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to