Re: [openstack-dev] More on the topic of DELIMITER, the Quota Management Library proposal

Monty Taylor Thu, 21 Apr 2016 18:03:28 -0700

On 04/21/2016 07:07 PM, Jay Pipes wrote:

Hmm, where do I start... I think I will just cut to the two primary
disagreements I have. And I will top-post because this email is way too
big.


1) On serializable isolation level.

No, you don't need it at all to prevent races in claiming. Just use a
compare-and-update with retries strategy. Proof is here:

https://github.com/jaypipes/placement-bench/blob/master/placement.py#L97-L142


Works great and prevents multiple writers from oversubscribing any
resource without relying on any particular isolation level at all.

The `generation` field in the inventories table is what allows multiple
writers to ensure a consistent view of the data without needing to rely
on heavy lock-based semantics and/or RDBMS-specific isolation levels.

2) On reservations.

The reason I don't believe reservations are necessary to be in a quota
library is because reservations add a concept of a time to a claim of
some resource. You reserve some resource to be claimed at some point in
the future and release those resources at a point further in time.

Quota checking doesn't look at what the state of some system will be at
some point in the future. It simply returns whether the system *right
now* can handle a request *right now* to claim a set of resources.

If you want reservation semantics for some resource, that's totally
cool, but IMHO, a reservation service should live outside of the service
that is actually responsible for providing resources to a consumer.
Merging right-now quota checks and future-based reservations into the
same library just complicates things unnecessarily IMHO.

3) On resizes.

Look, I recognize some users see some value in resizing their resources.
That's fine. I personally think expand operations are fine, and that
shrink operations are really the operations that should be prohibited in
the API. But, whatever, I'm fine with resizing of requested resource
amounts. My big point is if you don't have a separate table that stores
quota_usages and instead only have a single table that stores the actual
resource usage records, you don't have to do *any* quota check
operations at all upon deletion of a resource. For modifying resource
amounts (i.e. a resize) you merely need to change the calculation of
requested resource amounts to account for the already-consumed usage
amount.

Bottom line for me: I really won't support any proposal for a complex
library that takes the resource claim process out of the hands of the
services that own those resources. The simpler the interface of this
library, the better.

I agree with every word that Jay has written here. I especially agreewith point 1, and in fact have been in favor of that approach over thecurrent system of table locks in the nova quota code for about as longas there has been nova quota code. But all three points are spot on.

On 04/19/2016 09:59 PM, Amrith Kumar wrote:

-----Original Message-----
From: Jay Pipes [mailto:jaypi...@gmail.com]
Sent: Monday, April 18, 2016 2:54 PM
To: openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] More on the topic of DELIMITER, the Quota
Management Library proposal

On 04/16/2016 05:51 PM, Amrith Kumar wrote:

If we therefore assume that this will be a Quota Management Library,
it is safe to assume  that quotas are going to be managed on a
per-project basis, where participating projects will use this library.
I believe that it stands to reason that any data persistence will have
to be in a location decided by the individual project.


Depends on what you mean by "any data persistence". If you are referring
to the storage of quota values (per user, per tenant, global, etc) I
think
that should be done by the Keystone service. This data is essentially an
attribute of the user or the tenant or the service endpoint itself (i.e.
global defaults). This data also rarely changes and logically belongs to
the service that manages users, tenants, and service endpoints:
Keystone.

If you are referring to the storage of resource usage records, yes, each
service project should own that data (and frankly, I don't see a need to
persist any quota usage data at all, as I mentioned in a previous
reply to
Attila).


[amrith] You make a distinction that I had made implicitly, and it is
important
to highlight it. Thanks for pointing it out. Yes, I meant both of the
above, and as stipulated. Global defaults in keystone (somehow, TBD) and
usage records, on a per-service basis.

That may not be a very interesting statement but the corollary is, I
think, a very significant statement; it cannot be assumed that the
quota management information for all participating projects is in the
same database.


It cannot be assumed that this information is even in a database at
all...


[amrith] I don't follow. If the service in question is to be scalable,
I think it
stands to reason that there must be some mechanism by which instances of
the service can share usage records (as you refer to them, and I like
that term). I think it stands to reason that there must be some
database, no?

A hypothetical service consuming the Delimiter library provides
requesters with some widgets, and wishes to track the widgets that it
has provisioned both on a per-user basis, and on the whole. It should
therefore multi-tenant and able to track the widgets on a per tenant
basis and if required impose limits on the number of widgets that a
tenant may consume at a time, during a course of a period of time, and
so on.


No, this last part is absolutely not what I think quota management
should
be about.

Rate limiting -- i.e. how many requests a particular user can make of an
API in a given period of time -- should *not* be handled by OpenStack
API
services, IMHO. It is the responsibility of the deployer to handle this
using off-the-shelf rate-limiting solutions (open source or
proprietary).

Quotas should only be about the hard limit of different types of
resources
that a user or group of users can consume at a given time.

[amrith] OK, good point. Agreed as stipulated.

Such a hypothetical service may also consume resources from other
services that it wishes to track, and impose limits on.


Yes, absolutely agreed.

It is also understood as Jay Pipes points out in [4] that the actual
process of provisioning widgets could be time consuming and it is
ill-advised to hold a database transaction of any kind open for that
duration of time. Ensuring that a user does not exceed some limit on
the number of concurrent widgets that he or she may create therefore
requires some mechanism to track in-flight requests for widgets. I
view these as "intent" but not yet materialized.


It has nothing to do with the amount of concurrent widgets that a
user can
create. It's just about the total number of some resource that may be
consumed by that user.

As for an "intent", I don't believe tracking intent is the right way
to go
at all. As I've mentioned before, the major problem in Nova's quota
system
is that there are two tables storing resource usage records: the
*actual* resource usage tables (the allocations table in the new
resource-
providers modeling and the instance_extra, pci_devices and instances
table
in the legacy modeling) and the *quota usage* tables (quota_usages and
reservations tables). The quota_usages table does not need to exist at
all, and neither does the reservations table. Don't do intent-based
consumption. Instead, just consume (claim) by writing a record for the
resource class consumed on a provider into the actual resource usages
table and then "check quotas" by querying the *actual* resource
usages and
comparing the SUM(used) values, grouped by resource class, against the
appropriate quota limits for the user. The introduction of the
quota_usages and reservations tables to cache usage records is the
primary
reason for the race problems in the Nova (and
other) quota system because every time you introduce a caching system
for
highly-volatile data (like usage records) you introduce complexity into
the write path and the need to track the same thing across multiple
writes
to different tables needlessly.


[amrith] I don't agree, I'll respond to this and the next comment
group together. See below.

Looking up at this whole infrastructure from the perspective of the
database, I think we should require that the database must not be
required to operate in any isolation mode higher than READ-COMMITTED;
more about that later (i.e. requiring a database run either
serializable or repeatable read is a show stopper).


This is an implementation detail is not relevant to the discussion about
what the interface of a quota library would look like.


[amrith] I disagree, let me give you an example of why.

Earlier, I wrote:

Such a hypothetical service may also consume resources from other
services that it wishes to track, and impose limits on.


And you responded:

Yes, absolutely agreed.


So let's take this hypothetical service that in response to a user
request, will provision a Cinder volume and a Nova instance. Let's
assume that the service also imposes limits on the number of cinder
volumes and nova instances the user may provision; independent of
limits that Nova and Cinder may themselves maintain.

One way that the hypothetical service can function is this:

(a) check Cinder quota, if successful, create cinder volume
(b) check Nova quota, if successful, create nova instance with cinder
volume attachment

Now, this is sub-optimal as there are going to be some number of cases
where the nova quota check fails. Now you have needlessly created and
will have to release a cinder volume. It also takes longer to fail.

Another way to do this is this:

(1) check Cinder quota, if successful, check Nova quota, if successful
proceed to (2) else error out
(2) create cinder volume
(3) create nova instance with cinder attachment.

I'm trying to get to this latter form of doing things.

Easy, you might say ... theoretically this should simply be:

       BEGIN;
       -- Get data to do the Cinder check

       SELECT ......

       -- Do the cinder check

       INSERT INTO ....

       -- Get data to do the Nova check

       SELECT ....

       -- Do the Nova check

       INSERT INTO ...

       COMMIT

You can only make this work if you ran at isolation level
serializable. Why?

To make this run at isolation level REPEATABLE-READ, you must enforce
constraints at the database level that will fail the commit. But wait,
you can't do that because the data about the global limits may not be
in the same database as the usage records. Later you talk about
caching and stuff; all that doesn't help a database constraint.

For this reason, I think there is going to have to be some cognizance
to the database isolation level in the design of the library, and I
think it will also impact the API that can be constructed.

In general therefore, I believe that the hypothetical service
processing requests for widgets would have to handle three kinds of
operations, provision, modify, and destroy. The names are, I believe,
self-explanatory.


Generally, modification of a resource doesn't come into play. The
primary
exception to this is for transferring of ownership of some resource.


[amrith] Trove RESIZE is a huge benefit for users and while it may be
a pain as you say, this is still a very real benefit. Trove allows you
to resize both your storage (resize the cinder volume) and resize your
instance (change the flavor).

Without loss of generality, one can say that all three of them must
validate that the operation does not violate some limit (no more than
X widgets, no fewer than X widgets, rates, and so on).


No, only the creation (and very rarely the modification) needs any
validation that a limit could been violated. Destroying a resource never
needs to be checked for limit violations.


[amrith] Well, if you are going to create a volume of 10GB and your
limit is 100GB, resizing it to 200GB should fail, I think.

Assuming that the service provisions resources from other services, it
is also conceivable that limits be imposed on the quantum of those
services consumed. In practice, I can imagine a service like Trove
using the Delimiter project to perform all of these kinds of limit
checks; I'm not suggesting that it does this today, nor that there is
an immediate plan to implement all of them, just that these all seem
like good uses a Quota Management  capability.

          - User may not have more than 25 database instances at a time
          - User may not have more than 4 clusters at a time
          - User may not consume more than 3TB of SSD storage at a time


Only if SSD storage is a distinct resource class from DISK_GB. Right
now,
Nova makes no differentiation w.r.t. SSD or HDD or shared vs. local
block
storage.


[amrith] It matters not to Trove whether Nova does nor not. Cinder
supports volume-types and users DO want to limit based on volume-type
(for example).

          - User may not launch more than 10 huge instances at a time


What is the point of such a limit?


[amrith] Metering usage, placing limitations on the quantum of
resources that a user may provision. Same as with Nova. A flavor is
merely a simple way to tie together a bag of resources. It is a way to
restrict access, for example, to specific resources that are available
in the cloud. HUGE is just an example I gave, pick any flavor you
want, and here's how a service like Trove uses it.

Users can ask to launch an instance of a specific database+version;
MySQL 5.6-48 for example. Now, an operator can restrict the instance
flavors, or volume types that can be associated with the specific
datastore. And the flavor could be used to map to, for example whether
the instance is running on bare metal or in a VM and if so with what
kind of hardware. That's a useful construct for a service like Trove.

          - User may not launch more than 3 clusters an hour


-1. This is rate limiting and should be handled by rate-limiting
services.

          - No more than 500 copies of Oracle may be run at a time


Is "Oracle" a resource class?


[amrith] As I view it, every project should be free to define its own
set of resource classes and meter them as it feels fit. So, while
Oracle licenses may not, conceivably a lot of things that Nova,
Cinder, and the other core projects don't care about, are in fact
relevant for a consumer of this library.

While Nova would be the service that limits the number of instances a
user can have at a time, the ability for a service to limit this
further should not be underestimated.

In turn, should Nova and Cinder also use the same Quota Management
Library, they may each impose limitations like:

          - User may not launch more than 20 huge instances at a time


Not a useful limitation IMHO.


[amrith] I beg to differ. Again a huge instance is just an example of
some flavor; and the idea is to allow a project to place its own
metrics and meter based on those.

          - User may not launch more than 3 instances in a minute


-1. This is rate limiting.

          - User may not consume more than 15TB of SSD at a time
          - User may not have more than 30 volumes at a time

Again, I'm not implying that either Nova or Cinder should provide
these capabilities.

With this in mind, I believe that the minimal set of operations that
Delimiter should provide are:

          - define_resource(name, max, min, user_max, user_min, ...)


What would the above do? What service would it be speaking to?


[amrith] I assume that this would speak with some backend (either
keystone or the project itself) and record these designated limits.
This is the way to register a project specific metric like "Oracle
licenses".

          - update_resource_limits(name, user, user_max, user_min, ...)


This doesn't belong in a quota library. It belongs as a REST API in
Keystone.


[amrith] Fine, same place where the previous thing stores the global
defaults is the target of this call.

          - reserve_resource(name, user, size, parent_resource, ...)


This doesn't belong in a quota library at all. I think reservations are
not germane to resource consumption and should be handled by an external
service at the orchestration layer.


[amrith] Again not true, as illustrated above this library is the
thing that projects could use to determine whether or not to honor a
request. This reserve/provision process is, I believe required because
of the vagaries of how we want to implement this in the database.

          - provision_resource(resource, id)


A quota library should not be provisioning anything. A quota library
should simply provide a consistent interface for *checking* that a
structured request for some set of resources *can* be provided by the
service.


[amrith] This does not actually call Nova or anything; merely that
AFTER the hypothetical service has called NOVA, this converts the
reservation (which can expire) into an actual allocation.

          - update_resource(id or resource, newsize)


Resizing resources is a bad idea, IMHO. Resources are easier to deal
with
when they are considered of immutable size and simple (i.e. not
complex or
nested). I think the problem here is in the definition of resource
classes
improperly.


[amrith] Let's leave the quota library aside. This assertion strikes
at the very heart of things like Nova resize, or for that matter
Cinder volume resize. Are those all bad ideas? I made a 500GB Cinder
volume and it is getting close to full. I'd like to resize it to 2TB.
Are you saying that's not a valid use case?

For example, a "cluster" is not a resource. It is a collection of
resources of type node. "Resizing" a cluster is a misnomer, because you
aren't resizing a resource at all. Instead, you are creating or
destroying
resources inside the cluster (i.e. joining or leaving cluster nodes).

BTW, this is also why the "resize instance" API in Nova is such a giant
pain in the ass. It's attempting to "modify" the instance "resource"
when the instance isn't really the resource at all. The VCPU, RAM_MB,
DISK_GB, and PCI devices are the actual resources. The instance is a
convenient way to tie those resources together, and doing a "resize" of
the instance behind the scenes actually performs a *move* operation,
which
isn't a *change* of the original resources. Rather, it is a creation
of a
new set of resources (of the new amounts) and a deletion of the old
set of
resources.


[amrith] that's fine, if all we want is to handle the resize operation
as a new instance followed by a deletion, that's great. But that
semantic isn't necessarily the case for something like (say) cinder.

The "resize" API call adds some nasty confirmation and cancel
semantics to
the calling interface that hint that the underlying implementation of
the
"resize" operation is in actuality not a resize at all, but rather a
create-new-and-delete-old-resources operation.


[amrith] And that isn't germane to a quota library, I don't think.
What is, is this. Do we want to treat the transient state when there
are (for example of Nova) two instances, one of the new flavor and one
of the old flavor, or not. But, from the perspective of a quota
library, a resize operation is merely a reset of the quota by the
delta in the resource consumed.

          - release_resource(id or resource)
          - expire_reservations()


I see no need to have reservations in the quota library at all, as
mentioned above.


[amrith] Then I think the quota library must require that either (a)
the underlying database runs serializable or (b) database constraints
can be used to enforce that at commit the global limits are adhered to.

As for your proposed interface and calling structure below, I think a
much
simpler proposal would work better. I'll work on a cross-project spec
that
describes this simpler proposal, but the basics would be:

1) Have Keystone store quota information for defaults (per service
endpoint), for tenants and for users.

Keystone would have the set of canonical resource class names, and each
project, upon handling a new resource class, would be responsible for a
change submitted to Keystone to add the new resource class code.

Straw man REST API:

GET /quotas/resource-classes
200 OK
{
    "resource_classes": {
      "compute.vcpu": {
        "service": "compute",
        "code": "compute.vcpu",
        "description": "A virtual CPU unit"
      },
      "compute.ram_mb": {
        "service": "compute",
        "code": "compute.ram_mb",
        "description": "Memory in megabytes"
      },
      ...
      "volume.disk_gb": {
        "service": "volume",
        "code": "volume.disk_gb",
        "description": "Amount of disk space in gigabytes"
      },
      ...
      "database.count": {
         "service": "database",
         "code": "database.count",
         "description": "Number of database instances"
      }
    }
}


[amrith] Well, a user is allowed to have a certain compute quota
(which is shared by Nova and Trove) but also a Trove quota. How would
your representation represent that?

# Get the default limits for new users...
GET /quotas/defaults
200 OK
{
    "quotas": {
      "compute.vcpu": 100,
      "compute.ram_mb": 32768,
      "volume.disk_gb": 1000,
      "database.count": 25
    }
}

# Get a specific user's limits...
GET /quotas/users/{UUID}
200 OK
{
    "quotas": {
      "compute.vcpu": 100,
      "compute.ram_mb": 32768,
      "volume.disk_gb": 1000,
      "database.count": 25
    }
}

# Get a tenant's limits...
GET /quotas/tenants/{UUID}
200 OK
{
    "quotas": {
      "compute.vcpu": 1000,
      "compute.ram_mb": 327680,
      "volume.disk_gb": 10000,
      "database.count": 250
    }
}

2) Have Delimiter communicate with the above proposed new Keystone REST
API and package up data into an oslo.versioned_objects interface.

Clearly all of the above can be heavily cached both on the server and
client side since they rarely change but are read often.


[amrith] Caching on the client won't save you from oversubscription if
you don't run serializable.

The Delimiter library could be used to provide a calling interface for
service projects to get a user's limits for a set of resource classes:

(please excuse wrongness, typos, and other stuff below, it's just a
straw-
man not production working code...)

# file: delimiter/objects/limits.py
import oslo.versioned_objects.base as ovo import
oslo.versioned_objects.fields as ovo_fields


class ResourceLimit(ovo.VersionedObjectBase):
    # 1.0: Initial version
    VERSION = '1.0'

     fields = {
        'resource_class': ovo_fields.StringField(),
        'amount': ovo_fields.IntegerField(),
     }


class ResourceLimitList(ovo.VersionedObjectBase):
    # 1.0: Initial version
    VERSION = '1.0'

    fields = {
      'resources': ListOfObjectsField(ResourceLimit),
    }

    @cache_this_heavily
    @remotable_classmethod
    def get_all_by_user(cls, user_uuid):
      """Returns a Limits object that tells the caller what a user's
      absolute limits for the set of resource classes in the system.
      """
      # Grab a keystone client session object and connect to Keystone
      ks = ksclient.Session(...)
      raw_limits = ksclient.get_limits_by_user()
      return cls(resources=[ResourceLimit(**d) for d in raw_limits])

3) Each service project would be responsible for handling the
consumption
of a set of requested resource amounts in an atomic and consistent way.


[amrith] This is where the rubber meets the road. What is that atomic
and consistent way? And what computing infrastructure do you need to
deliver this?

The Delimiter library would return the limits that the service would
pre-
check before claiming the resources and either post-check after claim or
utilize a compare-and-update technique with a generation/timestamp
during
claiming to prevent race conditions.

For instance, in Nova with the new resource providers database schema
and
doing claims in the scheduler (a proposed change), we might do something
to the effect of:

from delimiter import objects as delim_obj from delimier import
exceptions
as delim_exc from nova import objects as nova_obj

request = nova_obj.RequestSpec.get_by_uuid(request_uuid)
requested = request.resources
limits = delim_obj.ResourceLimitList.get_all_by_user(user_uuid)
allocations = nova_obj.AllocationList.get_all_by_user(user_uuid)

# Pre-check for violations
for resource_class, requested_amount in requested.items():
    limit_idx = limits.resources.index(resource_class)
    resource_limit = limits.resources[limit_idx].amount
    alloc_idx = allocations.resources.index(resource_class)
    resource_used = allocations.resources[alloc_idx]
    if (resource_used + requested_amount) > resource_limit:
      raise delim_exc.QuotaExceeded


[amrith] Is the above code run with some global mutex to prevent that
two people don't believe that they are good on quota at the same time?

# Do claims in scheduler in an atomic, consistent fashion...
claims = scheduler_client.claim_resources(request)


[amrith] Yes, each 'atomic' claim on a repeatable-read database could
result in oversubscription.

# Post-check for violations
allocations = nova_obj.AllocationList.get_all_by_user(user_uuid)
# allocations now include the claimed resources from the scheduler

for resource_class, requested_amount in requested.items():
    limit_idx = limits.resources.index(resource_class)
    resource_limit = limits.resources[limit_idx].amount
    alloc_idx = allocations.resources.index(resource_class)
    resource_used = allocations.resources[alloc_idx]
    if resource_used > resource_limit:
      # Delete the allocation records for the resources just claimed
      delete_resources(claims)
      raise delim_exc.QuotaExceeded


[amrith] Again, two people could drive through this code and both of
them could fail :(

4) The only other thing that would need to be done for a first go of the
Delimiter library is some event listener that can listen for changes to
the quota limits for a user/tenant/default in Keystone. We'd want the
services to be able notify someone if a reduction in quota results in an
overquota situation.

Anyway, that's my idea. Keep the Delimiter library small and focused on
describing the limits only, not on the resource allocations. Have the
Delimiter library present a versioned object interface so the
interaction
between the data exposed by the Keystone REST API for quotas can evolve
naturally and smoothly over time.

Best,
-jay

Let me illustrate the way I see these things fitting together. A
hypothetical Trove system may be setup as follows:

          - No more than 2000 database instances in total, 300 clusters

in

          total
          - Users may not launch more than 25 database instances, or 4
          clusters
          - The particular user 'amrith' is limited to 2 databases
and 1
          cluster
          - No user may consume more than 20TB of storage at a time
          - No user may consume more than 10GB of memory at a time

At startup, I believe that the system would make the following
sequence of calls:

          - define_resource(databaseInstance, 2000, 0, 25, 0, ...)
          - update_resource_limits(databaseInstance, amrith, 2, 0, ...)
          - define_resource(databaseCluster, 300, 0, 4, 0, ...)
          - update_resource_limits(databaseCluster, amrith, 1, 0, ...)
          - define_resource(storage, -1, 0, 20TB, 0, ...)
          - define_resource(memory, -1, 0, 10GB, 0, ...)

Assume that the user john comes along and asks for a cluster with 4
nodes, 1TB storage per node and each node having 1GB of memory, the
system would go through the following sequence:

          - reserve_resource(databaseCluster, john, 1, None)
                  o this returns a resourceID (say cluster-resource-ID)
                  o the cluster instance that it reserves counts
against
                  the limit of 300 cluster instances in total, as
well as
                  the 4 clusters that john can provision. If
'amrith' had
                  requested it, that would have been counted against
the
                  limit of 2 clusters for the user.

          - reserve_resource(databaseInstance, john, 1,
          cluster-resource-id)
          - reserve_resource(databaseInstance, john, 1,
          cluster-resource-id)
          - reserve_resource(databaseInstance, john, 1,
          cluster-resource-id)
          - reserve_resource(databaseInstance, john, 1,
          cluster-resource-id)
                  o this returns four resource id's, let's say
                  instance-1-id,  instance-2-id, instance-3-id,
                  instance-4-id
                  o note that each instance is that, an instance by
                  itself. it is therefore not right to consider this as
                  equivalent to a call to reserve_resource() with a
size
                  of 4, especially because each instance could later be
                  tracked as an individual Nova instance.

          - reserve_resource(storage, john, 1TB, instance-1-id)
          - reserve_resource(storage, john, 1TB, instance-2-id)
          - reserve_resource(storage, john, 1TB, instance-3-id)
          - reserve_resource(storage, john, 1TB, instance-4-id)

                  o each of them returns some resourceID, let's say
they
                  returned cinder-1-id, cinder-2-id, cinder-3-id,
                  cinder-4-id
                  o since the storage of 1TB is a unit, it is
treated as
                  such. In other words, you don't need to invoke
                  reserve_resource 10^12 times, once per byte allocated
:)

          - reserve_resource(memory, john, 1GB, instance-1-id)
          - reserve_resource(memory, john, 1GB, instance-2-id)
          - reserve_resource(memory, john, 1GB, instance-3-id)
          - reserve_resource(memory, john, 1GB, instance-4-id)
                  o each of these return something, say
                  Dg4KBQcODAENBQEGBAcEDA, CgMJAg8FBQ8GDwgLBA8FAg,
                  BAQJBwYMDwAIAA0DBAkNAg, AQMLDA4OAgEBCQ0MBAMGCA. I
have
                  made up arbitrary strings just to highlight that we
                  really don't track these anywhere so we don't care

about

                  them.

If all this works, then the system knows that John's request does not
violate any quotas that it can enforce, it can then go ahead and
launch the instances (calling Nova), provision storage, and so on.

The system then goes and creates four Cinder volumes, these are
cinder-1-uuid, cinder-2-uuid, cinder-3-uuid, cinder-4-uuid.

It can then go and confirm those reservations.

          - provision_resource(cinder-1-id, cinder-1-uuid)
          - provision_resource(cinder-2-id, cinder-2-uuid)
          - provision_resource(cinder-3-id, cinder-3-uuid)
          - provision_resource(cinder-4-id, cinder-4-uuid)

It could then go and launch 4 nova instances and similarly provision
those resources, and so on. This process could take some minutes and
holding a database transaction open for this is the issue that Jay
brings up in [4]. We don't have to in this proposed scheme.

Since the resources are all hierarchically linked through the overall
cluster id, when the cluster is setup, it can finally go and provision
that:

- provision_resource(cluster-resource-id, cluster-uuid)

When Trove is done with some individual resource, it can go and
release it. Note that I'm thinking this will invoke release_resource
with the ID of the underlying object OR the resource.

          - release_resource(cinder-4-id), and
          - release_resource(cinder-4-uuid)

are therefore identical and indicate that the 4th 1TB volume is now
released. How this will be implemented in Python, kwargs or some other
mechanism is, I believe, an implementation detail.

Finally, it releases the cluster resource by doing this:

          - release_resource(cluster-resource-id)

This would release the cluster and all dependent resources in a single
operation.

A user may wish to manage a resource that was provisioned from the
service. Assume that this results in a resizing of the instances, then
it is a matter of updating that resource.

Assume that the third 1TB volume is being resized to 2TB, then it is
merely a matter of invoking:

          - update_resource(cinder-3-uuid, 2TB)

Delimiter can go figure out that cinder-3-uuid is a 1TB device and
therefore this is an increase of 1TB and verify that this is within
the quotas allowed for the user.

The thing that I find attractive about this model of maintaining a
hierarchy of reservations is that in the event of an error, the
service need merely call release_resource() on the highest level
reservation and the Delimiter project can walk down the chain and
release all the resources or reservations as appropriate.

Under the covers I believe that each of these operations should be
atomic and may update multiple database tables but these will all be
short lived operations.

For example, reserving an instance resource would increment the number
of instances for the user as well as the number of instances on the
whole, and this would be an atomic operation.

I have two primary areas of concern about the proposal [3].

          The first is that it makes the implicit assumption that the
          "flat mode" is implemented. That provides value to a consumer
          but I think it leaves a lot for the consumer to do. For

example,

          I find it hard to see how the model proposed would handle the
          release of quotas, leave alone the case of a nested
release of

          hierarchy of resources.

          The other is the notion that the implementation will begin a
          transaction, perform a query(), make some manipulations, and
          then do a save(). This makes for an interesting transaction
          management challenge as it would require the underlying

database

          to run in an isolation mode of at least repeatable reads and
          maybe even serializable which would be a performance bear
on a
          heavily loaded system. If run in the traditional
read-committed
          mode, this would silently lead to over subscriptions, and the
          violation of quota limits.

I believe that it should be a requirement that the Delimiter library
should be able to run against a database that supports, and is
configured for READ-COMMITTED, and should not require anything higher.
The model proposed above can certainly be implemented with a database
running READ-COMMITTED, and I believe that this is also true with the
caveat that the operations will be performed through SQLAlchemy.

Thanks,

-amrith

[1] http://openstack.markmail.org/thread/tkl2jcyvzgifniux
[2] http://openstack.markmail.org/thread/3cr7hoeqjmgyle2j
[3] https://review.openstack.org/#/c/284454/
[4] http://markmail.org/message/7ixvezcsj3uyiro6





______________________________________________________________________
____ OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__________________________________________________________________________

OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__________________________________________________________________________

OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] More on the topic of DELIMITER, the Quota Management Library proposal

Reply via email to