Re: [Qemu-devel] [PATCH RESEND 1/2] Block: Block replication design for COLO

Eric Blake Wed, 25 Mar 2015 09:08:34 -0700

On 12/25/2014 08:31 PM, Yang Hongyang wrote:
> This is the initial design of block replication.
> The blkcolo block driver enables disk replication for continuous
> checkpoints. It is designed for COLO that Secondary VM is running.
> It can also be applied for FT/HA scene that Secondary VM is not
> running.
> 
> Signed-off-by: Wen Congyang <we...@cn.fujitsu.com>
> Signed-off-by: Lai Jiangshan <la...@cn.fujitsu.com>
> Signed-off-by: Yang Hongyang <yan...@cn.fujitsu.com>
> ---
>  docs/blkcolo.txt | 85 
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 85 insertions(+)
>  create mode 100644 docs/blkcolo.txt


Grammar review only (I'll leave the technical review to others)

> 
> diff --git a/docs/blkcolo.txt b/docs/blkcolo.txt
> new file mode 100644
> index 0000000..41c2a05
> --- /dev/null
> +++ b/docs/blkcolo.txt
> @@ -0,0 +1,85 @@
> +Disk replication using blkcolo
> +----------------------------------------
> +Copyright Fujitsu, Corp. 2014

Visually, the separator line should match the length of the line above,
and maybe have a blank line after.

> +
> +This work is licensed under the terms of the GNU GPL, version 2 or later.
> +See the COPYING file in the top-level directory.
> +
> +The blkcolo block driver enables disk replication for continuous checkpoints.
> +It is designed for COLO that Secondary VM is running. It can also be applied

similar comments as for Wen's RFC COLO v2 series for
docs/block-replication.txt (in fact, do we need two files, or should all
this information be merged into a single file?):

s/for COLO that/for COLO (COurse-grain LOck-stepping replication), where/

> +for FT/HA scene that Secondary VM is not running.

s/for FT/HA scene that/to FT/HA (Fault-tolerance/High assurance)
scenarios, where/

> +
> +This document gives an overview of blkcolo's design.
> +
> +== Background ==
> +High availability solutions such as micro checkpoint and COLO will do
> +consecutive checkpoint. The VM state of Primary VM and Secondary VM is

s/checkpoint/checkpoints/

> +identical right after a VM checkpoint, but becomes different as the VM
> +executes till the next checkpoint. To support disk contents checkpoint,
> +the modified disk contents in the Secondary VM must be buffered, and are
> +only dropped at next checkpoint time. To reduce the network transportation
> +effort at the time of checkpoint, the disk modification operations of
> +Primary disk are asynchronously forwarded to the Secondary node.
> +
> +== Disk Buffer ==
> +The following is the image of Disk buffer:
> +
> +        +----------------------+            +------------------------+
> +        |Primary Write Requests|            |Secondary Write Requests|
> +        +----------------------+            +------------------------+
> +                  |                                       |
> +                  |                                      (4)
> +                  |                                       V
> +                  |                              /-------------\
> +                  |      Copy and Forward        |             |
> +                  |---------(1)----------+       | Disk Buffer |
> +                  |                      |       |             |
> +                  |                     (3)      \-------------/
> +                  |                 speculative      ^
> +                  |                write through    (2)
> +                  |                      |           |
> +                  V                      V           |
> +           +--------------+           +----------------+
> +           | Primary Disk |           | Secondary Disk |
> +           +--------------+           +----------------+
> +    1) Primary write requests will be copied and forwarded to Secondary
> +       QEMU.
> +    2) Before Primary write requests are written to Secondary disk, the
> +       original sector content will be read from Secondary disk and
> +       buffered in the Disk buffer, but it will not overwrite the existing
> +       sector content in the Disk buffer.
> +    3) Primary write requests will be written to Secondary disk.
> +    4) Secondary write requests will be bufferd in the Disk buffer and it

s/bufferd/buffered/

> +       will overwrite the existing sector content in the buffer.
> +
> +== Capture I/O request ==
> +The blkcolo is a new block driver protocol, so all I/O requests can be
> +captured in the driver interface bdrv_co_readv()/bdrv_co_writev().
> +
> +== Checkpoint & failover ==
> +The blkcolo buffers the write requests in Secondary QEMU. And the buffer
> +should be dropped at a checkpoint, or be flushed to Secondary disk when

s/when/on/

> +failover. We add four block driver interfaces to do this:
> +a. bdrv_prepare_checkpoint()
> +   This interface may block, and return when all Primary write

s/return/returns/

> +   requests are forwarded to Secondary QEMU.
> +b. bdrv_do_checkpoint()
> +   This interface is called after all VM state is transfered to

s/transfered/transferred/

> +   Secondary QEMU. The Disk buffer will be dropped in this interface.
> +c. bdrv_get_sent_data_size()
> +   This is used on Primary node.
> +   It should be called by migration/checkpoint thread in order
> +   to decide whether to start a new checkpoint or not. If the data
> +   amount being sent is too large, we should start a new checkpoint.
> +d. bdrv_stop_replication()
> +   It is called when failover. We will flush the Disk buffer into

s/when/on/

> +   Secondary Disk and stop disk replication.
> +
> +== Usage ==
> +On both Primary/Secondary host, invoke QEMU with the following parameters:
> +    "-drive file=blkcolo:host:port:/path/to/image"
> +a. host
> +   Hostname or IP of the Secondary host.
> +b. port
> +   The Secondary QEMU will listen on this port, and the Primary QEMU
> +   will connect to this port.
> 

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH RESEND 1/2] Block: Block replication design for COLO

Reply via email to