On 4/29/2016 5:32 PM, Murray, Paul (HP Cloud) wrote:
The following summarizes status of the main topics relating to live
migration after the Newton design summit. Please feel free to correct
any inaccuracies or add additional information.



Paul



-------------------------------------------------------------



Libvirt storage pools



The storage pools work has been selected as one of the project review
priorities for Newton.

(see https://etherpad.openstack.org/p/newton-nova-summit-priorities )



Continuation of the libvirt storage pools work was discussed in the live
migration session. The proposal has grown to include a refactor of the
existing libvirt driver instance storage code. Justification for this is
based on three factors:

1.       The code needs to be refactored to use storage pools

2.       The code is complicated and uses inspection, poor practice

3.       During the investigation Matt Booth discovered two CVEs in the
code – suggesting further work is justified



So the proposal is now to follow three stages:

1.       Refactor the instance storage code

2.       Adapt to use storage pools for the instance storage

3.       Use storage pools to drive resize/migration

We also talked about the need for some additional test coverage for the refactor work:

1. A job that uses LVM on the experimental queue.

2. ploop should be covered by the Virtuozzo Compute third party CI but we'll need to double-check the test coverage there (is it running the tests that hit the code paths being refactored). Note that they have their own blueprint for implementing resize for ploop:

https://blueprints.launchpad.net/nova/+spec/virtuozzo-instance-resize-support

3. Ceph testing - we already have a single-node job for Ceph that will test the resize paths. We should also be testing Ceph-backed live migration in the special live-migration job that Timofey has been working on.

4. NFS testing - this also falls into the special live migration CI job that will test live migration in different storage configurations within a single run.




Matt has code already starting the refactor and will continue with help
from Paul Carlton + Paul Murray. We will look for additional
contributors to help as we plan out the patches.



https://review.openstack.org/#/c/302117 : Persist libvirt instance
storage metadata

https://review.openstack.org/#/c/310505 : Use libvirt storage pools

https://review.openstack.org/#/c/310538 : Migrate libvirt volumes



Post copy



The spec to add post copy migration support in the libvirt driver was
discussed in the live migration session. Post copy guarantees completion
of a migration in linear time without needing to pause the VM. This can
be used as an alternative to pausing in live-migration-force-complete.
Pause or complete could also be invoked automatically under some
circumstances. The issue slowing these specs is how to decide which
method to use given they provide a different user experience but we
don’t want to expose virt specific features in the API. Two additional
specs listed below suggest possible generic ways to address the issue.



There was no conclusions reached in the session so the debate will
continue on the specs. The first below is the main spec for the feature.



https://review.openstack.org/#/c/301509 : Adds post-copy live migration
support to Nova

https://review.openstack.org/#/c/305425 : Define instance availability
profiles

https://review.openstack.org/#/c/306561 : Automatic Live Migration
Completion



Live Migration orchestrated via conductor



The proposal to move orchestration of live migration to conductor was
discussed in the working session on Friday, presented by Andrew Laski on
behalf of Timofey Durakov. This one threw up a lot of debate both for
and against the general idea, but not supporting the patches that have
been submitted along with the spec so far. The general feeling was that
we need to attack this, but need to take some simple first cleanup steps
first to get a better idea of the problem. Dan Smith proposed moving the
stateless pre-migration steps to a sequence of calls from conductor (as
opposed to the going back and forth between computes) as the first step.



https://review.openstack.org/#/c/292271 : Remove compute-compute
communication in live-migration



Cold and Live Migration Scheduling



When this patch merges all migrations will use the request spec for
scheduling: https://review.openstack.org/#/c/284974

Work is still ongoing for check destinations (allowing the scheduler to
check a destination chosen by the admin). When that is complete
migrations will have three ways to be placed:

1.       Destination chosen by scheduler

2.       Destination chosen by admin but checked by scheduler

3.       Destination forced by admin



https://review.openstack.org/#/c/296408 Re-Proposes to check destination
on migrations



PCI + NUMA claims



Moshe and Jay are making great progress refactoring Nicola’s patches to
fix PCI and NUMA handling in migrations. The patch series should be
completed soon.

The patch series for that is here (dependent on some cleanups from Jay and the top patch needs to be rebased):

https://review.openstack.org/#/c/307124/

It would be great if we could test this with some NFV CI but from the notes in the session it sounds like we need a multi-node job for this?






__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Thanks for the great write-up Paul, you've saved me some time. :) And thanks to the whole sub-team working on this for keeping up the focus.

--

Thanks,

Matt Riedemann


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to