Ping... Hi Juan & Amit,
Could you please help review this series ? Since it is already comes v9, i really hope to get your feedback on this series :) Thanks, zhanghailiang On 2015/9/2 16:22, zhanghailiang wrote:
This is the 9th version of COLO. Please Note that, this version is very different from the previous versions. since we have decided to realize proxy in qemu, which based on slirp in qemu. We dropped all the original colo proxy related part. It will be a long time for proxy to be ready for merging, so here we extract the basic periodic checkpoint part that not depend on proxy into this series. Actually, the 'periodic' mode is also what we want to support in COLO, it is based on Yang Hongyang's netfilter series. and this mode is very like MicroCheckpointing and Remus. You can find the discussion about why & how to realize the colo proxy in qemu from the follow link: http://lists.nongnu.org/archive/html/qemu-devel/2015-07/msg04069.html As usual, here is only COLO frame part, you can get the whole codes from github: https://github.com/coloft/qemu/commits/colo-v2.0-periodic-mode Compared with previous versions, this version is more easy to test. Test procedure: 1. Startup qemu Primary side: # x86_64-softmmu/qemu-system-x86_64 -enable-kvm -netdev tap,id=bn0 -netfilter buffer,id=f0,netdev=bn0,chain=in -device virtio-net-pci,id=net-pci0,netdev=bn0 -boot c -drive if=virtio,id=disk1,driver=quorum,read-pattern=fifo,cache=none,aio=native,children.0.file.filename=/mnt/sdd/pure_IMG/linux/redhat/rhel_6.5_64_2U_ide,children.0.driver=raw -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio -S Secondary side: # x86_64-softmmu/qemu-system-x86_64 -enable-kvm -netdev tap,id=bn0 -device virtio-net-pci,id=net-pci0,netdev=bn0 -drive if=none,driver=raw,file=/mnt/sdd/pure_IMG/linux/redhat/rhel_6.5_64_2U_ide,id=colo1,cache=none,aio=native -drive if=virtio,driver=replication,mode=secondary,throttling.bps-total=70000000,file.file.filename=/mnt/ramfs/active_disk.img,file.driver=qcow2,file.backing.file.filename=/mnt/ramfs/hidden_disk.img,file.backing.driver=qcow2,file.backing.backing.backing_reference=colo1,file.backing.allow-write-backing-file=on -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-table -monitor stdio -incoming tcp:0:8888 2. On Secondary VM's QEMU monitor, issue command (qemu) nbd_server_start 192.168.2.88:8889 (qemu) nbd_server_add -w colo1 3. On Primary VM's QEMU monitor, issue command: (qemu) child_add disk1 child.driver=replication,child.mode=primary,child.file.host=192.168.2.88,child.file.port=8889,child.file.export=colo1,child.file.driver=nbd,child.ignore-errors=on (qemu) migrate_set_capability colo on (qemu) migrate tcp:192.168.2.88:8888 4. After the above steps, you will see, whenever you make changes to PVM, SVM will be synced. You can by issue command "migrate_set_parameter checkpoint-delay 2000" to change the checkpoint period time. 5. Failover test You can kill PVM and run 'colo_lost_heartbeat' in SVM's monitor at the same time, then SVM will failover and client will not feel this change. COLO is a totally new feature which is still in early stage, your comments and feedback are warmly welcomed. TODO: 1. checkpoint based on proxy in qemu 2. The capability of continuous FT v9: - Drop colo proxy related part (colo-nic.c file) - Convert COLO protocol name definition to QAPI - Smash failover related patch (patch 19/20/23) - Fix colo exit event according Eric's comments. - Fix some typos from Eric's comments - Fix bug 'invalid runstate transition: 'colo' -> 'prelaunch' reported by Dave (patch 27) - Use migrate_set_parameter intead of ecolo-set-checkpoint-period to set checkpoint delay time (patch 25) - Add new patch (patch 29/30) to seperate the process of saving/loading device and state during checkpoint. which will reduce the data size for sending and also reduce the qsb size used in checkpoint. Wen Congyang (1): COLO: Add block replication into colo process zhanghailiang (31): configure: Add parameter for configure to enable/disable COLO support migration: Introduce capability 'colo' to migration COLO: migrate colo related info to slave migration: Add state records for migration incoming migration: Integrate COLO checkpoint process into migration migration: Integrate COLO checkpoint process into loadvm migration: Rename the'file' member of MigrationState and MigrationIncomingState COLO/migration: establish a new communication path from destination to source COLO: Implement colo checkpoint protocol COLO: Add a new RunState RUN_STATE_COLO QEMUSizedBuffer: Introduce two help functions for qsb COLO: Save PVM state to secondary side when do checkpoint COLO: Load PVM's dirty pages into SVM's RAM cache temporarily COLO: Load VMState into qsb before restore it COLO: Flush PVM's cached RAM into SVM's memory COLO: synchronize PVM's state to SVM periodically COLO failover: Introduce a new command to trigger a failover COLO failover: Introduce state to record failover process COLO: Implement failover work for Primary VM COLO: Implement failover work for Secondary VM COLO: implement default failover treatment qmp event: Add event notification for COLO error COLO failover: Shutdown related socket fd when do failover COLO failover: Don't do failover during loading VM's state COLO: Control the checkpoint delay time by migrate-set-parameters command COLO: Implement shutdown checkpoint COLO: Update the global runstate after going into colo state savevm: Split load vm state function qemu_loadvm_state COLO: Separate the process of saving/loading ram and device state COLO: Split qemu_savevm_state_begin out of checkpoint process COLO: Add net packets treatment into COLO configure | 11 + docs/qmp/qmp-events.txt | 17 + hmp-commands.hx | 15 + hmp.c | 16 + hmp.h | 1 + include/exec/cpu-all.h | 1 + include/migration/colo.h | 44 +++ include/migration/failover.h | 33 ++ include/migration/migration.h | 16 +- include/migration/qemu-file.h | 3 +- include/sysemu/sysemu.h | 8 + migration/Makefile.objs | 2 + migration/colo-comm.c | 75 ++++ migration/colo-failover.c | 83 +++++ migration/colo.c | 782 ++++++++++++++++++++++++++++++++++++++++++ migration/exec.c | 4 +- migration/fd.c | 4 +- migration/migration.c | 184 +++++++--- migration/qemu-file-buf.c | 58 ++++ migration/ram.c | 185 +++++++++- migration/savevm.c | 309 +++++++++++++---- migration/tcp.c | 4 +- migration/unix.c | 4 +- qapi-schema.json | 101 +++++- qapi/event.json | 17 + qmp-commands.hx | 20 ++ stubs/Makefile.objs | 1 + stubs/migration-colo.c | 45 +++ trace-events | 8 + vl.c | 37 +- 30 files changed, 1930 insertions(+), 158 deletions(-) create mode 100644 include/migration/colo.h create mode 100644 include/migration/failover.h create mode 100644 migration/colo-comm.c create mode 100644 migration/colo-failover.c create mode 100644 migration/colo.c create mode 100644 stubs/migration-colo.c