On 10/10/2018 06:32 AM, Balázs Gibizer wrote:
Hi,
Thanks for all the feedback. I feel the following consensus is forming:
1) remove the force flag in a new microversion. I've proposed a spec
about that API change [1]
+1
2) in the old microversions change the blind allocation copy to gather
every resource from a nested source RPs too and try to allocate that
from the destination root RP. In nested allocation cases putting this
allocation to placement will fail and nova will fail the migration /
evacuation. However it will succeed if the server does not need nested
allocation neither on the source nor on the destination host (a.k.a the
legacy case). Or if the server has nested allocation on the source host
but does not need nested allocation on the destination host (for
example the dest host does not have nested RP tree yet).
I disagree on this. I'd rather just do a simple check for >1 provider in
the allocations on the source and if True, fail hard.
The reverse (going from a non-nested source to a nested destination)
will hard fail anyway on the destination because the POST /allocations
won't work due to capacity exceeded (or failure to have any inventory at
all for certain resource classes on the destination's root compute node).
-jay
I will start implementing #2) as part of the
use-nested-allocation-candidate bp soon and will continue with #1)
later in the cycle.
Nothing is set in stone yet so feedback is still very appreciated.
Cheers,
gibi
[1] https://review.openstack.org/#/c/609330/
On Tue, Oct 9, 2018 at 11:40 AM, Balázs Gibizer
<balazs.gibi...@ericsson.com> wrote:
Hi,
Setup
-----
nested allocation: an allocation that contains resources from one or
more nested RPs. (if you have better term for this then please
suggest).
If an instance has nested allocation it means that the compute, it
allocates from, has a nested RP tree. BUT if a compute has a nested
RP tree it does not automatically means that the instance, allocating
from that compute, has a nested allocation (e.g. bandwidth inventory
will be on a nested RPs but not every instance will require bandwidth)
Afaiu, as soon as we have NUMA modelling in place the most trivial
servers will have nested allocations as CPU and MEMORY inverntory
will be moved to the nested NUMA RPs. But NUMA is still in the future.
Sidenote: there is an edge case reported by bauzas when an instance
allocates _only_ from nested RPs. This was discussed on last Friday
and it resulted in a new patch[0] but I would like to keep that
discussion separate from this if possible.
Sidenote: the current problem somewhat related to not just nested PRs
but to sharing RPs as well. However I'm not aiming to implement
sharing support in Nova right now so I also try to keep the sharing
disscussion separated if possible.
There was already some discussion on the Monday's scheduler meeting
but I could not attend.
http://eavesdrop.openstack.org/meetings/nova_scheduler/2018/nova_scheduler.2018-10-08-14.00.log.html#l-20
The meat
--------
Both live-migrate[1] and evacuate[2] has an optional force flag on
the nova REST API. The documentation says: "Force <the action> by not
verifying the provided destination host by the scheduler."
Nova implements this statement by not calling the scheduler if
force=True BUT still try to manage allocations in placement.
To have allocation on the destination host Nova blindly copies the
instance allocation from the source host to the destination host
during these operations. Nova can do that as 1) the whole allocation
is against a single RP (the compute RP) and 2) Nova knows both the
source compute RP and the destination compute RP.
However as soon as we bring nested allocations into the picture that
blind copy will not be feasible. Possible cases
0) The instance has non-nested allocation on the source and would
need non nested allocation on the destination. This works with blindy
copy today.
1) The instance has a nested allocation on the source and would need
a nested allocation on the destination as well.
2) The instance has a non-nested allocation on the source and would
need a nested allocation on the destination.
3) The instance has a nested allocation on the source and would need
a non nested allocation on the destination.
Nova cannot generate nested allocations easily without reimplementing
some of the placement allocation candidate (a_c) code. However I
don't like the idea of duplicating some of the a_c code in Nova.
Nova cannot detect what kind of allocation (nested or non-nested) an
instance would need on the destination without calling placement a_c.
So knowing when to call placement is a chicken and egg problem.
Possible solutions:
A) fail fast
------------
0) Nova can detect that the source allocatioin is non-nested and try
the blindy copy and it will succeed.
1) Nova can detect that the source allocaton is nested and fail the
operation
2) Nova only sees a non nested source allocation. Even if the dest RP
tree is nested it does not mean that the allocation will be nested.
We cannot fail fast. Nova can try the blind copy and allocate every
resources from the root RP of the destination. If the instance
require nested allocation instead the claim will fail in placement.
So nova can fail the operation a bit later than in 1).
3) Nova can detect that the source allocation is nested and fail the
operation. However and enhanced blind copy that tries to allocation
everything from the root RP on the destinaton would have worked.
B) Guess when to ignore the force flag and call the scheduler
-------------------------------------------------------------
0) keep the blind copy as it works
1) Nova detect that the source allocation is nested. Ignores the
force flag and calls the scheduler that will call placement a_c. Move
operation can succeed.
2) Nova only sees a non nested source allocation so it will fall back
to blind copy and fails at the claim on destination.
3) Nova detect that the source allocation is nested. Ignores the
force flag and calls the scheduler that will call placement a_c. Move
operation can succeed.
This solution would be against the API doc that states nova does not
call the scheduler if the operation is forced. However in case of
force live-migration Nova already verifies the target host from
couple of perspective in [3].
This solution is alreay proposed for live-migrate in [4] and for
evacuate in [5] so the complexity of the solution can be seen in the
reviews.
C) Remove the force flag from the API in a new microversion
-----------------------------------------------------------
0)-3): all cases would call the scheduler to verify the target host
and generate the nested (or non-nested) allocation.
We would still need an agreed behavior (from A), B), D)) for the old
microversions as the todays code creates inconsistent allocation in
#1) and #3) by ignoring the resource from the nested RP.
D) Do not manage allocations in placement for forced operation
--------------------------------------------------------------
Force flag is considered as a last resort tool for the admin to move
VMs around. The API doc has a fat warning about the danger of it. So
Nova can simply ignore resource allocation task if force=True. Nova
would delete the source allocation and does not create any allocation
on the destination host.
This is a simple but dangerous solution but it is what the force flag
is all about, move the server against all the built in safeties. (If
the admin needs the safeties she can set force=False and still
specify the destination host)
I'm open to any suggestions.
Cheers,
gibi
[0] https://review.openstack.org/#/c/608298/
[1]
https://developer.openstack.org/api-ref/compute/#live-migrate-server-os-migratelive-action
[2]
https://developer.openstack.org/api-ref/compute/#evacuate-server-evacuate-action
[3]
https://github.com/openstack/nova/blob/c5a7002bd571379818c0108296041d12bc171728/nova/conductor/tasks/live_migrate.py#L97
[4] https://review.openstack.org/#/c/605785
[5] https://review.openstack.org/#/c/606111
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev