Re: [openstack-dev] [nova] Supporting force live-migrate and force evacuate with nested allocations

Balázs Gibizer Wed, 10 Oct 2018 06:18:55 -0700


On Wed, Oct 10, 2018 at 2:46 PM, Jay Pipes <[email protected]> wrote:
> On 10/10/2018 06:32 AM, Balázs Gibizer wrote:
>> Hi,
>> 
>> Thanks for all the feedback. I feel the following consensus is 
>> forming:
>> 
>> 1) remove the force flag in a new microversion. I've proposed a spec
>> about that API change [1]
> 
> +1
> 
>> 2) in the old microversions change the blind allocation copy to 
>> gather
>> every resource from a nested source RPs too and try to allocate that
>> from the destination root RP. In nested allocation cases putting this
>> allocation to placement will fail and nova will fail the migration /
>> evacuation. However it will succeed if the server does not need 
>> nested
>> allocation neither on the source nor on the destination host (a.k.a 
>> the
>> legacy case). Or if the server has nested allocation on the source 
>> host
>> but does not need nested allocation on the destination host (for
>> example the dest host does not have nested RP tree yet).
> 
> I disagree on this. I'd rather just do a simple check for >1 provider 
> in the allocations on the source and if True, fail hard.
> 
> The reverse (going from a non-nested source to a nested destination) 
> will hard fail anyway on the destination because the POST 
> /allocations won't work due to capacity exceeded (or failure to have 
> any inventory at all for certain resource classes on the 
> destination's root compute node).


If we hard fail on >1 provider in an allocation on the source then we 
lose the (not really common) case when the source allocation is nested 
but the destination node does not have a nested RP tree yet and it 
would support the summarized allocation on the root RP.
But sure simply failing would be a simpler solution.

gibi

> 
> -jay
> 
>> I will start implementing #2) as part of the
>> use-nested-allocation-candidate bp soon and will continue with #1)
>> later in the cycle.
>> 
>> Nothing is set in stone yet so feedback is still very appreciated.
>> 
>> Cheers,
>> gibi
>> 
>> [1] https://review.openstack.org/#/c/609330/
>> 
>> On Tue, Oct 9, 2018 at 11:40 AM, Balázs Gibizer
>> <[email protected]> wrote:
>>> Hi,
>>> 
>>> Setup
>>> -----
>>> 
>>> nested allocation: an allocation that contains resources from one or
>>> more nested RPs. (if you have better term for this then please
>>> suggest).
>>> 
>>> If an instance has nested allocation it means that the compute, it
>>> allocates from, has a nested RP tree. BUT if a compute has a nested
>>> RP tree it does not automatically means that the instance, 
>>> allocating
>>> from that compute, has a nested allocation (e.g. bandwidth inventory
>>> will be on a nested RPs but not every instance will require 
>>> bandwidth)
>>> 
>>> Afaiu, as soon as we have NUMA modelling in place the most trivial
>>> servers will have nested allocations as CPU and MEMORY inverntory
>>> will be moved to the nested NUMA RPs. But NUMA is still in the 
>>> future.
>>> 
>>> Sidenote: there is an edge case reported by bauzas when an instance
>>> allocates _only_ from nested RPs. This was discussed on last Friday
>>> and it resulted in a new patch[0] but I would like to keep that
>>> discussion separate from this if possible.
>>> 
>>> Sidenote: the current problem somewhat related to not just nested 
>>> PRs
>>> but to sharing RPs as well. However I'm not aiming to implement
>>> sharing support in Nova right now so I also try to keep the sharing
>>> disscussion separated if possible.
>>> 
>>> There was already some discussion on the Monday's scheduler meeting
>>> but I could not attend.
>>> http://eavesdrop.openstack.org/meetings/nova_scheduler/2018/nova_scheduler.2018-10-08-14.00.log.html#l-20
>>> 
>>> 
>>> The meat
>>> --------
>>> 
>>> Both live-migrate[1] and evacuate[2] has an optional force flag on
>>> the nova REST API. The documentation says: "Force <the action> by 
>>> not
>>> verifying the provided destination host by the scheduler."
>>> 
>>> Nova implements this statement by not calling the scheduler if
>>> force=True BUT still try to manage allocations in placement.
>>> 
>>> To have allocation on the destination host Nova blindly copies the
>>> instance allocation from the source host to the destination host
>>> during these operations. Nova can do that as 1) the whole allocation
>>> is against a single RP (the compute RP) and 2) Nova knows both the
>>> source compute RP and the destination compute RP.
>>> 
>>> However as soon as we bring nested allocations into the picture that
>>> blind copy will not be feasible. Possible cases
>>> 0) The instance has non-nested allocation on the source and would
>>> need non nested allocation on the destination. This works with 
>>> blindy
>>> copy today.
>>> 1) The instance has a nested allocation on the source and would need
>>> a nested allocation on the destination as well.
>>> 2) The instance has a non-nested allocation on the source and would
>>> need a nested allocation on the destination.
>>> 3) The instance has a nested allocation on the source and would need
>>> a non nested allocation on the destination.
>>> 
>>> Nova cannot generate nested allocations easily without 
>>> reimplementing
>>> some of the placement allocation candidate (a_c) code. However I
>>> don't like the idea of duplicating some of the a_c code in Nova.
>>> 
>>> Nova cannot detect what kind of allocation (nested or non-nested) an
>>> instance would need on the destination without calling placement 
>>> a_c.
>>> So knowing when to call placement is a chicken and egg problem.
>>> 
>>> Possible solutions:
>>> A) fail fast
>>> ------------
>>> 0) Nova can detect that the source allocatioin is non-nested and try
>>> the blindy copy and it will succeed.
>>> 1) Nova can detect that the source allocaton is nested and fail the
>>> operation
>>> 2) Nova only sees a non nested source allocation. Even if the dest 
>>> RP
>>> tree is nested it does not mean that the allocation will be nested.
>>> We cannot fail fast. Nova can try the blind copy and allocate every
>>> resources from the root RP of the destination. If the instance
>>> require nested allocation instead the claim will fail in placement.
>>> So nova can fail the operation a bit later than in 1).
>>> 3) Nova can detect that the source allocation is nested and fail the
>>> operation. However and enhanced blind copy that tries to allocation
>>> everything from the root RP on the destinaton would have worked.
>>> 
>>> B) Guess when to ignore the force flag and call the scheduler
>>> -------------------------------------------------------------
>>> 0) keep the blind copy as it works
>>> 1) Nova detect that the source allocation is nested. Ignores the
>>> force flag and calls the scheduler that will call placement a_c. 
>>> Move
>>> operation can succeed.
>>> 2) Nova only sees a non nested source allocation so it will fall 
>>> back
>>> to blind copy and fails at the claim on destination.
>>> 3) Nova detect that the source allocation is nested. Ignores the
>>> force flag and calls the scheduler that will call placement a_c. 
>>> Move
>>> operation can succeed.
>>> 
>>> This solution would be against the API doc that states nova does not
>>> call the scheduler if the operation is forced. However in case of
>>> force live-migration Nova already verifies the target host from
>>> couple of perspective in [3].
>>> This solution is alreay proposed for live-migrate in [4] and for
>>> evacuate in [5] so the complexity of the solution can be seen in the
>>> reviews.
>>> 
>>> C) Remove the force flag from the API in a new microversion
>>> -----------------------------------------------------------
>>> 0)-3): all cases would call the scheduler to verify the target host
>>> and generate the nested (or non-nested) allocation.
>>> We would still need an agreed behavior (from A), B), D)) for the old
>>> microversions as the todays code creates inconsistent allocation in
>>> #1) and #3) by ignoring the resource from the nested RP.
>>> 
>>> D) Do not manage allocations in placement for forced operation
>>> --------------------------------------------------------------
>>> Force flag is considered as a last resort tool for the admin to move
>>> VMs around. The API doc has a fat warning about the danger of it. So
>>> Nova can simply ignore resource allocation task if force=True. Nova
>>> would delete the source allocation and does not create any 
>>> allocation
>>> on the destination host.
>>> 
>>> This is a simple but dangerous solution but it is what the force 
>>> flag
>>> is all about, move the server against all the built in safeties. (If
>>> the admin needs the safeties she can set force=False and still
>>> specify the destination host)
>>> 
>>> I'm open to any suggestions.
>>> 
>>> Cheers,
>>> gibi
>>> 
>>> [0] https://review.openstack.org/#/c/608298/
>>> [1]
>>> https://developer.openstack.org/api-ref/compute/#live-migrate-server-os-migratelive-action
>>> [2]
>>> https://developer.openstack.org/api-ref/compute/#evacuate-server-evacuate-action
>>> [3]
>>> https://github.com/openstack/nova/blob/c5a7002bd571379818c0108296041d12bc171728/nova/conductor/tasks/live_migrate.py#L97
>>> [4] https://review.openstack.org/#/c/605785
>>> [5] https://review.openstack.org/#/c/606111
>>> 
>> 
>> 
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: 
>> [email protected]?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> 
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: 
> [email protected]?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Supporting force live-migrate and force evacuate with nested allocations

Reply via email to