Peter Xu <pet...@redhat.com> writes: > Firstly, the "Paused" state was added in the wrong place before. The state > machine section was describing PostcopyState, rather than MigrationStatus. > Drop the Paused state descriptions. > > Then in the postcopy recover session, add more information on the state > machine for MigrationStatus in the lines. Add the new RECOVER_SETUP phase. > > Signed-off-by: Peter Xu <pet...@redhat.com>
Reviewed-by: Fabiano Rosas <faro...@suse.de> > --- > docs/devel/migration/postcopy.rst | 31 ++++++++++++++++--------------- > 1 file changed, 16 insertions(+), 15 deletions(-) > > diff --git a/docs/devel/migration/postcopy.rst > b/docs/devel/migration/postcopy.rst > index 6c51e96d79..a15594e11f 100644 > --- a/docs/devel/migration/postcopy.rst > +++ b/docs/devel/migration/postcopy.rst > @@ -99,17 +99,6 @@ ADVISE->DISCARD->LISTEN->RUNNING->END > (although it can't do the cleanup it would do as it > finishes a normal migration). > > - - Paused > - > - Postcopy can run into a paused state (normally on both sides when > - happens), where all threads will be temporarily halted mostly due to > - network errors. When reaching paused state, migration will make sure > - the qemu binary on both sides maintain the data without corrupting > - the VM. To continue the migration, the admin needs to fix the > - migration channel using the QMP command 'migrate-recover' on the > - destination node, then resume the migration using QMP command 'migrate' > - again on source node, with resume=true flag set. > - > - End > > The listen thread can now quit, and perform the cleanup of migration > @@ -221,7 +210,8 @@ paused postcopy migration. > > The recovery phase normally contains a few steps: > > - - When network issue occurs, both QEMU will go into PAUSED state > + - When network issue occurs, both QEMU will go into **POSTCOPY_PAUSED** > + migration state. > > - When the network is recovered (or a new network is provided), the admin > can setup the new channel for migration using QMP command > @@ -229,9 +219,20 @@ The recovery phase normally contains a few steps: > > - On source host, the admin can continue the interrupted postcopy > migration using QMP command 'migrate' with resume=true flag set. > - > - - After the connection is re-established, QEMU will continue the postcopy > - migration on both sides. > + Source QEMU will go into **POSTCOPY_RECOVER_SETUP** state trying to > + re-establish the channels. > + > + - When both sides of QEMU successfully reconnects using a new or fixed up s/reconnects/reconnect I can touch it up when queueing > + channel, they will go into **POSTCOPY_RECOVER** state, some handshake > + procedure will be needed to properly synchronize the VM states between > + the two QEMUs to continue the postcopy migration. For example, there > + can be pages sent right during the window when the network is > + interrupted, then the handshake will guarantee pages lost in-flight > + will be resent again. > + > + - After a proper handshake synchronization, QEMU will continue the > + postcopy migration on both sides and go back to **POSTCOPY_ACTIVE** > + state. Postcopy migration will continue. > > During a paused postcopy migration, the VM can logically still continue > running, and it will not be impacted from any page access to pages that