On 4/29/2016 5:32 PM, Murray, Paul (HP Cloud) wrote:
The following summarizes status of the main topics relating to live
migration after the Newton design summit. Please feel free to correct
any inaccuracies or add additional information.
Paul
-------------------------------------------------------------
Libvirt storage pools
The storage pools work has been selected as one of the project review
priorities for Newton.
(see https://etherpad.openstack.org/p/newton-nova-summit-priorities )
Continuation of the libvirt storage pools work was discussed in the live
migration session. The proposal has grown to include a refactor of the
existing libvirt driver instance storage code. Justification for this is
based on three factors:
1. The code needs to be refactored to use storage pools
2. The code is complicated and uses inspection, poor practice
3. During the investigation Matt Booth discovered two CVEs in the
code – suggesting further work is justified
So the proposal is now to follow three stages:
1. Refactor the instance storage code
2. Adapt to use storage pools for the instance storage
3. Use storage pools to drive resize/migration
We also talked about the need for some additional test coverage for the
refactor work:
1. A job that uses LVM on the experimental queue.
2. ploop should be covered by the Virtuozzo Compute third party CI but
we'll need to double-check the test coverage there (is it running the
tests that hit the code paths being refactored). Note that they have
their own blueprint for implementing resize for ploop:
https://blueprints.launchpad.net/nova/+spec/virtuozzo-instance-resize-support
3. Ceph testing - we already have a single-node job for Ceph that will
test the resize paths. We should also be testing Ceph-backed live
migration in the special live-migration job that Timofey has been
working on.
4. NFS testing - this also falls into the special live migration CI job
that will test live migration in different storage configurations within
a single run.
Matt has code already starting the refactor and will continue with help
from Paul Carlton + Paul Murray. We will look for additional
contributors to help as we plan out the patches.
https://review.openstack.org/#/c/302117 : Persist libvirt instance
storage metadata
https://review.openstack.org/#/c/310505 : Use libvirt storage pools
https://review.openstack.org/#/c/310538 : Migrate libvirt volumes
Post copy
The spec to add post copy migration support in the libvirt driver was
discussed in the live migration session. Post copy guarantees completion
of a migration in linear time without needing to pause the VM. This can
be used as an alternative to pausing in live-migration-force-complete.
Pause or complete could also be invoked automatically under some
circumstances. The issue slowing these specs is how to decide which
method to use given they provide a different user experience but we
don’t want to expose virt specific features in the API. Two additional
specs listed below suggest possible generic ways to address the issue.
There was no conclusions reached in the session so the debate will
continue on the specs. The first below is the main spec for the feature.
https://review.openstack.org/#/c/301509 : Adds post-copy live migration
support to Nova
https://review.openstack.org/#/c/305425 : Define instance availability
profiles
https://review.openstack.org/#/c/306561 : Automatic Live Migration
Completion
Live Migration orchestrated via conductor
The proposal to move orchestration of live migration to conductor was
discussed in the working session on Friday, presented by Andrew Laski on
behalf of Timofey Durakov. This one threw up a lot of debate both for
and against the general idea, but not supporting the patches that have
been submitted along with the spec so far. The general feeling was that
we need to attack this, but need to take some simple first cleanup steps
first to get a better idea of the problem. Dan Smith proposed moving the
stateless pre-migration steps to a sequence of calls from conductor (as
opposed to the going back and forth between computes) as the first step.
https://review.openstack.org/#/c/292271 : Remove compute-compute
communication in live-migration
Cold and Live Migration Scheduling
When this patch merges all migrations will use the request spec for
scheduling: https://review.openstack.org/#/c/284974
Work is still ongoing for check destinations (allowing the scheduler to
check a destination chosen by the admin). When that is complete
migrations will have three ways to be placed:
1. Destination chosen by scheduler
2. Destination chosen by admin but checked by scheduler
3. Destination forced by admin
https://review.openstack.org/#/c/296408 Re-Proposes to check destination
on migrations
PCI + NUMA claims
Moshe and Jay are making great progress refactoring Nicola’s patches to
fix PCI and NUMA handling in migrations. The patch series should be
completed soon.
The patch series for that is here (dependent on some cleanups from Jay
and the top patch needs to be rebased):
https://review.openstack.org/#/c/307124/
It would be great if we could test this with some NFV CI but from the
notes in the session it sounds like we need a multi-node job for this?
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Thanks for the great write-up Paul, you've saved me some time. :) And
thanks to the whole sub-team working on this for keeping up the focus.
--
Thanks,
Matt Riedemann
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev