Virtual machine (VM) replication is a well known technique for providing application-agnostic software-implemented hardware fault tolerance "non-stop service". COLO is a high availability solution. Both primary VM (PVM) and secondary VM (SVM) run in parallel. They receive the same request from client, and generate response in parallel too. If the response packets from PVM and SVM are identical, they are released immediately. Otherwise, a VM checkpoint (on demand) is conducted. The idea is presented in Xen summit 2012, and 2013, and academia paper in SOCC 2013. It's also presented in KVM forum 2013: http://www.linux-kvm.org/wiki/images/1/1d/Kvm-forum-2013-COLO.pdf Please refer to above document for detailed information. Please also refer to previous posted RFC proposal: http://lists.nongnu.org/archive/html/qemu-devel/2014-06/msg05567.html
The patchset is also hosted on github: https://github.com/macrosheep/qemu/tree/colo_v0.5 v2: use QEMUSizedBuffer/QEMUFile as COLO buffer colo support is enabled by default add nic replication support addressed comments from Eric Blake and Dr. David Alan Gilbert v1: implement the frame of colo This patchset is RFC, But it is ready for demo the COLO idea with QEMU-KVM. Steps using this patchset to get an overview of COLO: 1. configure 2. compile 3. just like QEMU's normal migration, run 2 QEMU VM: - Primary VM - Secondary VM with -incoming tcp:[IP]:[PORT] option 4. on Primary VM's QEMU monitor, run following command: migrate_set_capability colo on migrate tcp:[IP]:[PORT] 5. done you will see two runing VMs, whenever you make changes to PVM, SVM will be synced to PVM's state. TODO list: 1. failover (will require heartbeat module: http://www.linux-ha.org/wiki/Downloads) 2. disk replication[COLO Disk manager] Any comments/feedbacks are warmly welcomed. Thanks, Yang Dr. David Alan Gilbert (1): QEMUSizedBuffer/QEMUFile Yang Hongyang (22): configure: add CONFIG_COLO to switch COLO support COLO: introduce an api colo_supported() to indicate COLO support COLO migration: add a migration capability 'colo' COLO info: use colo info to tell migration target colo is enabled COLO save: integrate COLO checkpointed save into qemu migration COLO restore: integrate COLO checkpointed restore into qemu restore COLO: disable qdev hotplug COLO ctl: implement API's that communicate with colo agent COLO ctl: introduce is_slave() and is_master() COLO ctl: implement colo checkpoint protocol COLO ctl: add a RunState RUN_STATE_COLO COLO ctl: implement colo save COLO ctl: implement colo restore COLO save: reuse migration bitmap under colo checkpoint COLO ram cache: implement colo ram cache on slave HACK: trigger checkpoint every 500ms COLO nic: add command line switch COLO nic: init/remove colo nic devices when add/cleanup tap devices COLO nic: implement colo nic device interface support_colo() COLO nic: implement colo nic device interface configure() COLO nic: export colo nic APIs COLO nic: setup/teardown colo nic devices Makefile.objs | 2 + arch_init.c | 174 +++++++++++- configure | 14 + include/exec/cpu-all.h | 1 + include/migration/migration-colo.h | 36 +++ include/migration/migration.h | 13 + include/migration/qemu-file.h | 28 ++ include/net/colo-nic.h | 20 ++ include/net/net.h | 4 + include/qemu/typedefs.h | 1 + migration-colo-comm.c | 78 ++++++ migration-colo.c | 540 +++++++++++++++++++++++++++++++++++++ migration.c | 47 ++-- net/Makefile.objs | 1 + net/colo-nic.c | 227 ++++++++++++++++ net/tap.c | 45 +++- network-colo | 194 +++++++++++++ qapi-schema.json | 18 +- qemu-file.c | 410 ++++++++++++++++++++++++++++ qemu-options.hx | 10 +- stubs/Makefile.objs | 1 + stubs/migration-colo.c | 34 +++ vl.c | 12 + 23 files changed, 1879 insertions(+), 31 deletions(-) create mode 100644 include/migration/migration-colo.h create mode 100644 include/net/colo-nic.h create mode 100644 migration-colo-comm.c create mode 100644 migration-colo.c create mode 100644 net/colo-nic.c create mode 100755 network-colo create mode 100644 stubs/migration-colo.c -- 1.9.1