On Wed, Oct 10, 2018 at 2:46 PM, Jay Pipes <jaypi...@gmail.com> wrote: > On 10/10/2018 06:32 AM, Balázs Gibizer wrote: >> Hi, >> >> Thanks for all the feedback. I feel the following consensus is >> forming: >> >> 1) remove the force flag in a new microversion. I've proposed a spec >> about that API change [1] > > +1 > >> 2) in the old microversions change the blind allocation copy to >> gather >> every resource from a nested source RPs too and try to allocate that >> from the destination root RP. In nested allocation cases putting this >> allocation to placement will fail and nova will fail the migration / >> evacuation. However it will succeed if the server does not need >> nested >> allocation neither on the source nor on the destination host (a.k.a >> the >> legacy case). Or if the server has nested allocation on the source >> host >> but does not need nested allocation on the destination host (for >> example the dest host does not have nested RP tree yet). > > I disagree on this. I'd rather just do a simple check for >1 provider > in the allocations on the source and if True, fail hard. > > The reverse (going from a non-nested source to a nested destination) > will hard fail anyway on the destination because the POST > /allocations won't work due to capacity exceeded (or failure to have > any inventory at all for certain resource classes on the > destination's root compute node).
If we hard fail on >1 provider in an allocation on the source then we lose the (not really common) case when the source allocation is nested but the destination node does not have a nested RP tree yet and it would support the summarized allocation on the root RP. But sure simply failing would be a simpler solution. gibi > > -jay > >> I will start implementing #2) as part of the >> use-nested-allocation-candidate bp soon and will continue with #1) >> later in the cycle. >> >> Nothing is set in stone yet so feedback is still very appreciated. >> >> Cheers, >> gibi >> >> [1] https://review.openstack.org/#/c/609330/ >> >> On Tue, Oct 9, 2018 at 11:40 AM, Balázs Gibizer >> <balazs.gibi...@ericsson.com> wrote: >>> Hi, >>> >>> Setup >>> ----- >>> >>> nested allocation: an allocation that contains resources from one or >>> more nested RPs. (if you have better term for this then please >>> suggest). >>> >>> If an instance has nested allocation it means that the compute, it >>> allocates from, has a nested RP tree. BUT if a compute has a nested >>> RP tree it does not automatically means that the instance, >>> allocating >>> from that compute, has a nested allocation (e.g. bandwidth inventory >>> will be on a nested RPs but not every instance will require >>> bandwidth) >>> >>> Afaiu, as soon as we have NUMA modelling in place the most trivial >>> servers will have nested allocations as CPU and MEMORY inverntory >>> will be moved to the nested NUMA RPs. But NUMA is still in the >>> future. >>> >>> Sidenote: there is an edge case reported by bauzas when an instance >>> allocates _only_ from nested RPs. This was discussed on last Friday >>> and it resulted in a new patch[0] but I would like to keep that >>> discussion separate from this if possible. >>> >>> Sidenote: the current problem somewhat related to not just nested >>> PRs >>> but to sharing RPs as well. However I'm not aiming to implement >>> sharing support in Nova right now so I also try to keep the sharing >>> disscussion separated if possible. >>> >>> There was already some discussion on the Monday's scheduler meeting >>> but I could not attend. >>> http://eavesdrop.openstack.org/meetings/nova_scheduler/2018/nova_scheduler.2018-10-08-14.00.log.html#l-20 >>> >>> >>> The meat >>> -------- >>> >>> Both live-migrate[1] and evacuate[2] has an optional force flag on >>> the nova REST API. The documentation says: "Force <the action> by >>> not >>> verifying the provided destination host by the scheduler." >>> >>> Nova implements this statement by not calling the scheduler if >>> force=True BUT still try to manage allocations in placement. >>> >>> To have allocation on the destination host Nova blindly copies the >>> instance allocation from the source host to the destination host >>> during these operations. Nova can do that as 1) the whole allocation >>> is against a single RP (the compute RP) and 2) Nova knows both the >>> source compute RP and the destination compute RP. >>> >>> However as soon as we bring nested allocations into the picture that >>> blind copy will not be feasible. Possible cases >>> 0) The instance has non-nested allocation on the source and would >>> need non nested allocation on the destination. This works with >>> blindy >>> copy today. >>> 1) The instance has a nested allocation on the source and would need >>> a nested allocation on the destination as well. >>> 2) The instance has a non-nested allocation on the source and would >>> need a nested allocation on the destination. >>> 3) The instance has a nested allocation on the source and would need >>> a non nested allocation on the destination. >>> >>> Nova cannot generate nested allocations easily without >>> reimplementing >>> some of the placement allocation candidate (a_c) code. However I >>> don't like the idea of duplicating some of the a_c code in Nova. >>> >>> Nova cannot detect what kind of allocation (nested or non-nested) an >>> instance would need on the destination without calling placement >>> a_c. >>> So knowing when to call placement is a chicken and egg problem. >>> >>> Possible solutions: >>> A) fail fast >>> ------------ >>> 0) Nova can detect that the source allocatioin is non-nested and try >>> the blindy copy and it will succeed. >>> 1) Nova can detect that the source allocaton is nested and fail the >>> operation >>> 2) Nova only sees a non nested source allocation. Even if the dest >>> RP >>> tree is nested it does not mean that the allocation will be nested. >>> We cannot fail fast. Nova can try the blind copy and allocate every >>> resources from the root RP of the destination. If the instance >>> require nested allocation instead the claim will fail in placement. >>> So nova can fail the operation a bit later than in 1). >>> 3) Nova can detect that the source allocation is nested and fail the >>> operation. However and enhanced blind copy that tries to allocation >>> everything from the root RP on the destinaton would have worked. >>> >>> B) Guess when to ignore the force flag and call the scheduler >>> ------------------------------------------------------------- >>> 0) keep the blind copy as it works >>> 1) Nova detect that the source allocation is nested. Ignores the >>> force flag and calls the scheduler that will call placement a_c. >>> Move >>> operation can succeed. >>> 2) Nova only sees a non nested source allocation so it will fall >>> back >>> to blind copy and fails at the claim on destination. >>> 3) Nova detect that the source allocation is nested. Ignores the >>> force flag and calls the scheduler that will call placement a_c. >>> Move >>> operation can succeed. >>> >>> This solution would be against the API doc that states nova does not >>> call the scheduler if the operation is forced. However in case of >>> force live-migration Nova already verifies the target host from >>> couple of perspective in [3]. >>> This solution is alreay proposed for live-migrate in [4] and for >>> evacuate in [5] so the complexity of the solution can be seen in the >>> reviews. >>> >>> C) Remove the force flag from the API in a new microversion >>> ----------------------------------------------------------- >>> 0)-3): all cases would call the scheduler to verify the target host >>> and generate the nested (or non-nested) allocation. >>> We would still need an agreed behavior (from A), B), D)) for the old >>> microversions as the todays code creates inconsistent allocation in >>> #1) and #3) by ignoring the resource from the nested RP. >>> >>> D) Do not manage allocations in placement for forced operation >>> -------------------------------------------------------------- >>> Force flag is considered as a last resort tool for the admin to move >>> VMs around. The API doc has a fat warning about the danger of it. So >>> Nova can simply ignore resource allocation task if force=True. Nova >>> would delete the source allocation and does not create any >>> allocation >>> on the destination host. >>> >>> This is a simple but dangerous solution but it is what the force >>> flag >>> is all about, move the server against all the built in safeties. (If >>> the admin needs the safeties she can set force=False and still >>> specify the destination host) >>> >>> I'm open to any suggestions. >>> >>> Cheers, >>> gibi >>> >>> [0] https://review.openstack.org/#/c/608298/ >>> [1] >>> https://developer.openstack.org/api-ref/compute/#live-migrate-server-os-migratelive-action >>> [2] >>> https://developer.openstack.org/api-ref/compute/#evacuate-server-evacuate-action >>> [3] >>> https://github.com/openstack/nova/blob/c5a7002bd571379818c0108296041d12bc171728/nova/conductor/tasks/live_migrate.py#L97 >>> [4] https://review.openstack.org/#/c/605785 >>> [5] https://review.openstack.org/#/c/606111 >>> >> >> >> __________________________________________________________________________ >> OpenStack Development Mailing List (not for usage questions) >> Unsubscribe: >> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: > openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev