2016-11-03 4:52 GMT+08:00 Jay Pipes <jaypi...@gmail.com>: > On 11/01/2016 10:14 AM, Alex Xu wrote: > >> Currently we only update the resource usage with Placement API in the >> instance claim and the available resource update periodic task. But >> there is no claim for migration with placement API yet. This works is >> tracked by https://bugs.launchpad.net/nova/+bug/1621709. In newton, we >> only fix one bit which make the resource update periodic task works >> correctly, then it will auto-heal everything. For the migration claim >> part, that isn't the goal for newton release. >> >> So the first question is do we want to fix it in this release? If the >> answer is yes, there have a concern need to discuss. >> > > Yes, I believe we should fix the underlying problem in Ocata. The > underlying problem is what Sylvain brought up: live migrations do not > currently use any sort of claim operation. The periodic resource audit is > relied upon to essentially clean up the state of claimed resources over > time, and as Chris points out in review comments on > https://review.openstack.org/#/c/244489/, this leads to the scheduler > operating on stale data and can lead to an increase in retry operations. > > This needs to be fixed before even attempting to address the issue you > bring up with the placement API calls from the resource tracker.
ok, let me see if I can help something at here. > > > In order to implement the drop of migration claim, the RT needs to >> remove allocation records on the specific RP(on the source/destination >> compute node). But there isn't any API can do that. The API about remove >> allocation records is 'DELETE /allocations/{consumer_uuid}', but it will >> delete all the allocation records for the consumer. So the initial >> fix(https://review.openstack.org/#/c/369172/) adds new API 'DELETE >> /resource_providers/{rp_uuid}/allocations/{consumer_id}'. But Chris Dent >> pointed out this against the original design. All the allocations for >> the specific consumer only can be dropped together. >> > > Yes, and this is by design. Consumption of resources -- or the freeing > thereof -- must be an atomic, transactional operation. > > There also have suggestion from Andrew, we can update all the allocation >> records for the consumer each time. That means the RT will build the >> original allocation records and new allocation records for the claim >> together, and put into one API. That API should be 'PUT >> /allocations/{consumer_uuid}'. Unfortunately that API doesn't replace >> all the allocation records for the consumer, it always amends the new >> allocation records for the consumer. >> > > I see no reason why we can't change the behaviour of the `PUT > /allocations/{consumer_uuid}` call to allow changing either the amounts of > the allocated resources (a resize operation) or the set of resource > provider UUIDs referenced in the allocations list (a move operation). > > For instance, let's say we have an allocation for an instance "i1" that is > consuming 2 VCPU and 2048 MEMORY_MB on compute node "rpA", 50 DISK_GB on a > shared storage pool "rpC". > > The allocations table would have the following records in it: > > resource_provider resource_class consumer used > ----------------- -------------- -------- ---- > rpA VCPU i1 2 > rpA MEMORY_MB i1 2048 > rpC DISK_GB i1 50 > > Now, we need to migrate instance "i1" to compute node "rpB". The instance > disk uses shared storage so the only allocation records we actually need to > modify are the VCPU and MEMORY_MB records. > yea, think about with shared storage, this makes sense a lot. Thanks for such detail explain at here! > > We would create the following REST API call from the resource tracker on > the destination node: > > PUT /allocations/i1 > { > "allocations": [ > { > "resource_provider": { > "uuid": "rpB", > }, > "resources": { > "VCPU": 2, > "MEMORY_MB": 2048 > } > }, > { > "resource_provider": { > "uuid": "rpC", > }, > "resources": { > "DISK_GB": 50 > } > } > ] > } > > The placement service would receive that request payload and immediately > grab any existing allocation records referencing consumer_uuid of "i1". It > would notice that records referencing "rpA" (the source compute node) are > no longer needed. It would notice that the DISK_GB allocation hasn't > changed. And finally it would notice that there are new VCPU and MEMORY_MB > records referring to a new resource provider "rpB" (the destination compute > node). > > A single SQL transaction would be built that executes the following: > > BEGIN; > > # Grab the source and destination compute node provider generations > # to protect against concurrent writes... > $RPA_GEN := SELECT generation FROM resource_providers > WHERE uuid = 'rpA'; > $RPB_GEN := SELECT generation FROM resource_providers > WHERE uuid = 'rpB'; > > # Delete the allocation records referring to the source for the VCPU > # and MEMORY_MB resources > DELETE FROM allocations > WHERE consumer = 'i1' > AND resource_provider = 'rpA' > AND resource_class IN ('VCPU', 'MEMORY_MB'); > > # Add allocation records referring to the destination for VCPU and > # MEMORY_MB > INSERT INTO allocations > (resource_provider, resource_class, consumer, used) > VALUES > ('rpB', 'VCPU', 'i1', 2), > ('rpb', 'MEMORY_MB', 'i1', 2048); > > # Update the resource provider generations and rollback the > # transaction if any other writer modified the resource providers > # in between the initial read time and here. > UPDATE resource_providers > SET generation = $RPA_GENERATION + 1 > WHERE uuid = 'rpA' > AND generation = $RPA_GENERATION; > > IF ROWS_AFFECTED() == 0: > ROLLBACK > > UPDATE resource_providers > SET generation = $RPB_GENERATION + 1 > WHERE uuid = 'rpB' > AND generation = $RPB_GENERATION; > > IF ROWS_AFFECTED() == 0: > ROLLBACK > > COMMIT; > > In this way, we keep the API as is but simply handle move operations > transparently to the caller. The caller simply expresses what they wish the > allocation to look like with regards to which resource providers are having > which resources consumed from, and the placement service ensures that these > allocation records are written in an atomic fashion. > > Best, > -jay > > > So which directly we should go at here? >> > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev