> -----Original Message-----
> From: Monty Taylor [mailto:mord...@inaugust.com]
> Sent: 01 June 2016 13:54
> To: openstack-dev@lists.openstack.org
> Subject: Re: [openstack-dev] [Nova] State machines in Nova
> 
> On 06/01/2016 03:50 PM, Andrew Laski wrote:
> >
> >
> > On Wed, Jun 1, 2016, at 05:51 AM, Miles Gould wrote:
> >> On 31/05/16 21:03, Timofei Durakov wrote:
> >>> there is blueprint[1] that was approved during Liberty and
> >>> resubmitted to Newton(with spec[2]).
> >>> The idea is to define state machines for operations as
> >>> live-migration, resize, etc. and to deal with them operation states.
> >>
> >> +1 to introducing an explicit state machine - IME they make complex
> >> logic much easier to reason about. However, think carefully about how
> >> you'll make changes to that state machine later. In Ironic, this is
> >> an ongoing problem: every time we change the state machine, we have
> >> to decide whether to lie to older clients (and if so, what lie to
> >> tell them), or whether to present them with the truth (and if so, how
> >> badly they'll break). AIUI this would be a much smaller problem if
> >> we'd considered this possibility carefully at the beginning.
> >
> > This is a great point. I think most people have an implicit assumption
> > that the state machine will be exposed to end users via the API. I
> > would like to avoid that for exactly the reason you've mentioned. Of
> > course we'll want to expose something to users but whatever that is
> > should be loosely coupled with the internal states that actually drive the
> system.
> 
> +1billion
> 

I think this raises an interesting point.

tl;dr: I am starting to think we should not do the migration state machine spec 
being proposed before the tasks. But we should at least make the states we 
assign something other than arbitrary strings (e.g. constants defined in a 
particular place) and we should use the state names consistently.

Transitions can come from two places: 1) the user invokes the API to change the 
state of an instance, this is a good place to check that the instance is in a 
state to do the externally visible transition, 2) the state of the instance 
changes due to an internal event (host crash, deliberate operation...) this 
implies a change in the externally visible state of the instance, but cannot be 
prevented just because the state machine says this shouldn't happen (usually 
this is captured by the error state, but we can do better sometimes).

I think the state machines that are being defined in these changes are actually 
high level phases of the migration process that are currently observed by the 
user. I'm not sure they are particularly useful for coordinating the migration 
process itself and so are maybe not the right place to enforce internal 
transitions.

Live migration is an oddity in nova. Usually an instance is a single entity 
running on a single host (ignoring clustered hypervisors for the moment). There 
is a host manager responsible for that host that has the best view of the 
actual state of the instance or operations being performed on it. Generally the 
host manager is the natural place to coordinate operations on the instance.

In the case of live migration there are actually two VMs running on different 
hosts at a same time. The migration process involves coordinating transitions 
of those two VMs (attaching disks, plugging networks, starting the target VM, 
starting the migration, rebinding ports, stopping the source VM.....). The two 
VMs and their own individual states in this process are not represented 
explicitly. We only have an overall process coordinated by a distributed  
sequence of rpcs. There is a current spec moving that coordination to the 
conductor. When that sequence is interrupted or even completely lost (e.g. by a 
conductor failing or being restarted) we get into trouble. I think this is 
where our real problem lies.

We should sort out the internal process. The external view given to the user 
can be a true reflection the current state of the instance. The transitions of 
the instance should be internally coordinated.

Paul

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to