Relatively Cyborg-naive question here... I thought Cyborg was going to support a hot-plug model. So I certainly hope it is not the expectation that accelerators will be encoded into Nova flavors? That will severely limit its usefulness.
On 19 May 2018 at 23:30, Jay Pipes <jaypi...@gmail.com> wrote: > On 05/18/2018 07:58 AM, Nadathur, Sundar wrote: >> >> Agreed. Not sure how other projects handle it, but here's the situation >> for Cyborg. A request may get scheduled on a compute node with no >> intervention by Cyborg. So, the earliest check that can be made today is in >> the selected compute node. A simple approach can result in quota violations >> as in this example. >> >> Say there are 5 devices in a cluster. A tenant has a quota of 4 and >> is currently using 3. That leaves 2 unused devices, of which the >> tenant is permitted to use only one. But he may submit two >> concurrent requests, and they may land on two different compute >> nodes. The Cyborg agent in each node will see the current tenant >> usage as 3 and let the request go through, resulting in quota >> violation. > >> >> >> To prevent this, we need some kind of atomic update , like SQLAlchemy's >> with_lockmode(): >> >> https://wiki.openstack.org/wiki/OpenStack_and_SQLAlchemy#Pessimistic_Locking_-_SELECT_FOR_UPDATE >> That seems to have issues, as documented in the link above. Also, since >> every compute node does that, it would also serialize the bringup of all >> instances with accelerators, across the cluster. > >> >> >> If there is a better solution, I'll be happy to hear it. > > > The solution is to implement the following two specs: > > https://review.openstack.org/#/c/509042/ > https://review.openstack.org/#/c/569011/ > > The problem of consuming more resources than a user/project has quota for is > not a new problem. Users have been able to go over their quota in all of the > services for as long as I can remember -- they can do this by essentially > DDoS'ing the API with lots of concurrent single-instance build requests [1] > all at once. The tenant then ends up in an over-quota situation and is > essentially unable to do anything at all before deleting resources. > > The only operators that I can remember that complained about this issue were > the public cloud operators -- and rightfully so since quota abuse in public > clouds meant their reputation for fairness might be questioned. Most > operators I know of solved this problem by addressing *rate-limiting*, which > is not the same as quota limits. By rate-limiting requests to the APIs, the > operators were able to alleviate the problem by addressing a symptom, which > was that high rates of concurrent requests could lead to over-quota > situations. > > Nobody is using Cyborg separately from Nova at the moment (or ever?). It's > not as if a user will be consuming an accelerator outside of a Nova instance > -- since it is the Nova instance that is the workload that uses the > accelerator. > > That means that Cyborg resources should be treated as just another resource > class whose usage should be checked in a single query to the /usages > placement API endpoint before attempting to spawn the instance (again, via > Nova) that ends up consuming those resources. > > The claiming of all resources that are consumed by a Nova instance (which > would include any accelerator resources) is an atomic operation that > prevents over-allocation of any provider involved in the claim transaction. > [2] > > This atomic operation in Nova/Placement *significantly* cuts down on the > chances of a user/project exceeding its quota because it reduces the amount > of time to get an accurate read of the resource usage to a very small amount > of time (from seconds/tens of seconds to milliseconds). > > So, to sum up, my recommendation is to get involved in the two Nova specs > above and help to see them to completion in Rocky. Doing so will free Cyborg > developers up to focus on integration with the virt driver layer via the > os-acc library, implementing the update_provider_tree() interface, and > coming up with some standard resource classes for describing accelerated > resources. > > Best, > -jay > > [1] I'm explicitly calling out multiple concurrent single build requests > here, since a build request for multiple instances is actually not a cause > of over-quota because the entire set of requested instances is considered as > a single unit for usage calculation. > > [2] technically, NUMA topology resources and PCI devices do not currently > participate in this single claim transaction. This is not ideal, and is > something we are actively working on addressing. Keep in mind there are also > no quota classes for PCI devices or NUMA topologies, though, so the > over-quota problems don't exist for those resource classes. > > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Cheers, ~Blairo __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev