From: "Dr. David Alan Gilbert" <dgilb...@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilb...@redhat.com> --- docs/migration.txt | 150 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 150 insertions(+)
diff --git a/docs/migration.txt b/docs/migration.txt index 0492a45..fec2d46 100644 --- a/docs/migration.txt +++ b/docs/migration.txt @@ -294,3 +294,153 @@ save/send this state when we are in the middle of a pio operation (that is what ide_drive_pio_state_needed() checks). If DRQ_STAT is not enabled, the values on that fields are garbage and don't need to be sent. + += Return path = + +In most migration scenarios there is only a single data path that runs +from the source VM to the destination, typically along a single fd (although +possibly with another fd or similar for some fast way of throwing pages across). + +However, some uses need two way comms; in particular the Postcopy destination +needs to be able to request pages on demand from the source. + +For these scenarios there is a 'return path' from the destination to the source; +qemu_file_get_return_path(QEMUFile* fwdpath) gives the QEMUFile* for the return +path. + + Source side + Forward path - written by migration thread + Return path - opened by main thread, read by fd_handler on main thread + + Destination side + Forward path - read by main thread + Return path - opened by main thread, written by main thread AND postcopy + thread (protected by rp_mutex) + +Opening the return path generally sets the fd to be non-blocking so that a +failed destination can't block the source; and since the non-blockingness seems +to follow both directions it does alter the semantics of the forward path. + += Postcopy = +'Postcopy' migration is a way to deal with migrations that refuse to converge; +it's plus side is that there is an upper bound on the amount of migration traffic +and time it takes, the down side is that during the postcopy phase, a failure of +*either* side or the network connection causes the guest to be lost. + +In postcopy the destination CPUs are started before all the memory has been +transferred, and accesses to pages that are yet to be transferred cause +a fault that's translated by QEMU into a request to the source QEMU. + +Postcopy can be combined with precopy (i.e. normal migration) so that if precopy +doesn't finish in a given time the switch is automatically made to precopy. + +=== Enabling postcopy === + +To enable postcopy (prior to the start of migration): + +migrate_set_capability x-postcopy-ram on + +The migration will still start in precopy mode, however issuing: + +migrate_start_postcopy + +will now cause the transition from precopy to postcopy. +It can be issued immediately after migration is started or any +time later on. Issuing it after the end of a migration is harmless. + +=== Postcopy states === +Postcopy moves through a series of states (see postcopy_ram_state) +from ADVISE->LISTEN->RUNNING->END + + Advise: Set at the start of migration if postcopy is enabled, even + if it hasn't passed the start-time threshold; here the destination + checks it's OS has the support needed for postcopy, and performs + setup to ensure the RAM mappings are suitable for later postcopy. + (Triggered by reception of POSTCOPY_RAM_ADVISE command) + +Normal precopy now carries on as normal, until the point that the source +hits the start-time threshold and transitions to postcopy. The source +stops it's CPUs and transmits a 'discard bitmap' indicating pages that +have been previously sent but are now dirty again and hence are out of +date on the destination. + +The migration stream now contains a 'package' containing it's own chunk +of migration stream, followed by a return to a normal stream containing +page data. The package (sent as CMD_PACKAGED) contains the commands to +cycle the states on the destination, followed by all of the device +state excluding RAM. This lets the destination request pages from the +source in parallel with loading device state, this is required since +some devices (virtio) access guest memory during device initialisation. + + Listen: The first command in the package, POSTCOPY_RAM_LISTEN, switches + the destination state to Listen, and starts a new thread + (the 'listen thread') which takes over the job of receiving + pages off the migration stream, while the main thread carries + on processing the blob. With this thread able to process page + reception, the destination now 'sensitises' the RAM to detect + any access to missing pages (on Linux using the 'userfault' + system). + +The package now contains all the remaining state data and the command +to transition to the next state. + + Running: POSTCOPY_RAM_RUN causes the destination to synchronise all + state and start the CPUs and IO devices running. The main + thread now finishes processing the migration package and + now carries on as it would for normal precopy migration + (although it can't do the cleanup it would do as it + finishes a normal migration). + +Page data is sent from the source to the destination both as part +of a linear scan (like normal migration), and received by the 'listen thread', +When the destination tries to use a page it hasn't got, it requests +it from the source (down the return path) and the source sends this +page in the same stream. When the source has transmitted all pages +it sends a POSTCOPY_RAM_END command to transition to + + End: The listen thread can now quit, and perform the cleanup of migration +state, the migration is now complete. + +=== Source side page maps === +The source side keeps two bitmaps during postcopy; 'the migration bitmap' +and 'sent map'. The 'migration bitmap' is basically the same as in +the precopy case, and holds a bit to indicate that page is 'dirty' - +i.e. needs sending. During the precopy phase this is updated as the CPU +dirties pages, however during postcopy the CPUs are stopped and nothing +should dirty anything any more. + +The 'sent map' is used for the transition to postcopy. It is a bitmap that +has a bit set whenever a page is sent to the destination, however during +the transition to postcopy mode it is masked against the migration bitmap +(sentmap &= migrationbitmap) to generate a bitmap recording pages that +have been previously been sent but are now dirty again. This masked +sentmap is sent to the destination which discards those now dirty pages +before starting the CPUs. + +Note that once in postcopy mode, the sent map is still updated, however it's +contents are not-consistent as a local view of what's been sent since it's +only got the masked result. + +=== Destination side page maps === +(Needs to be changed so we can update both easily - at the moment updates are done + with a lock) +The destination keeps a 'requested map' and a 'received map'. +Both maps are initially 0, as pages are received the bits are set in 'received map'. +Incoming requests from the kernel cause the bit to be set in the 'requested map'. +When a page is received that is marked as 'requested' the kernel is notified. +If the kernel requests a page that has already been 'received' the kernel is notified +without re-requesting. + +This leads to three valid page states: +page states: + missing (!rc,!rq) - page not yet received or requested + received (rc,!rq) - Page received + requested (!rc,rq) - page requested but not yet received + +state transitions: + received -> missing (only during setup/discard) + + missing -> received (normal incoming page) + requested -> received (incoming page previously requested) + missing -> requested (userfault request) + -- 1.9.3