* zhanghailiang (zhang.zhanghaili...@huawei.com) wrote: > On 2015/8/24 22:38, Dr. David Alan Gilbert wrote: > >* zhanghailiang (zhang.zhanghaili...@huawei.com) wrote: > >>This is the 8th version of COLO. > > > >I'm seeing an occasional error: > > > > pcibus_reset: Assertion `bus->irq_count[i] == 0' failed. > > > >on the secondary; have you seen that? > > > >bus->irq_count[4] is -1 in my backtrace; it's > >colo_process_incoming_checkpoints->qemu_devices_reset->qbus_walk_children->qbus_reset_one->pcibus_reset > > > > No, we didn't come across such problem. Is there anything special for your > test ? What's your command line ? > Did it happen during the first checkpoint process ?
I was using e1000, it hasn't happened again since I switched to virtio-net-pci; so I suspect it's the e1000 having an outstanding interrupt while it's being reset. block_param="-drive if=none,driver=raw,file=$disk_path,id=colo1,cache=none,aio=native \ -drive if=virtio,driver=replication,mode=secondary,throttling.bps-total-max=70000000,\ file.file.filename=$TMPDISKS/colo-active-disk.qcow2,\ file.driver=qcow2,\ file.backing.file.filename=$TMPDISKS/colo-hidden-disk.qcow2,\ file.backing.driver=qcow2,\ file.backing.allow-write-backing-file=on,\ file.backing.backing.backing_reference=colo1" net_param="-netdev tap,id=hn0,script=$PWD/ifup-slave,\ downscript=$PWD/ifdown-slave,colo_script=$PWD/qemu/scripts/colo-proxy-script.sh,forward_nic=em4 \ -device virtio-net-pci,mac=9c:da:4d:1c:b5:89,id=net-pci0,netdev=hn0" console_param="-chardev socket,id=hmpfeed,server,nowait,telnet,port=9999,host=localhost -mon hmpfeed -nographic -chardev stdio,mux=on,id=mon -mon chardev=mon,mode=readline --device isa-serial,chardev=mon" ./try/bin/qemu-system-x86_64 -enable-kvm $console_param \ -boot c -m 4096 -smp 4 -machine pc-i440fx-2.3,accel=kvm -S \ -name debug-threads=on -trace events=trace-file \ -device virtio-rng-pci \ $block_param $net_param\ -incoming tcp:0:8888 Dave > > Thanks, > zhanghailiang > > >Dave > > > >>Here is only COLO frame part, include: VM checkpoint, > >>failover, proxy API, block replication API, not include block replication. > >>The block part is treated as a separate series. > >> > >>As usual, we provide 'basic' and 'developing' branches in github: > >>https://github.com/coloft/qemu/commits/colo-v1.5-basic > >>https://github.com/coloft/qemu/commits/colo-v1.5-developing (more features) > >> > >>The 'basic' branch is exactly the same with this patch series, > >>We will keep this series simple as possible, just for easy review. > >> > >>The extra features in colo-v1.5-developing branch: > >>1) Separate ram and device save/load process to reduce size of extra memory > >>used during checkpoint > >>2) Live migrate part of dirty pages to slave during sleep time. > >>3) You get the statistic info about checkpoint by command 'info migrate' > >> > >>Please reference to the follow link to test COLO. > >>http://wiki.qemu.org/Features/COLO. > >> > >>COLO is a totally new feature which is still in early stage, > >>your comments and feedback are warmly welcomed. > >> > >>NOTE: > >>We have decided to re-implement the colo proxy in userspace (In qemu > >>exactly). > >>you can find the discussion about why & how to realize the colo proxy in > >>qemu from the follow link: > >>http://lists.nongnu.org/archive/html/qemu-devel/2015-07/msg04069.html > >> > >>TODO: > >>1. COLO function switch on/off > >>2. The capability of continuous FT > >>3. Optimize the performance. > >> > >>v8: > >>- Move some global variables into MigrationIncomingState and MigrationState > >>- Move some cleanup work form colo thread and colo incoming thread into > >>failover > >> BH function and also fix the code logic for the cleanup work. > >>- fix the bug that colo thread and colo incoming thread possibly block in > >>the > >> socket 'recv' call when do failover work. > >>- Optimize colo_flush_ram_cache() > >>- Add migration state for incoming side, we use the state to verify if > >>migration > >> incoming side is in COLO state or not (Patch 5). > >>- Drop the patch 'COLO: Disable qdev hotplug when VM is in COLO mode', > >>since it is not correct. > >> > >>zhanghailiang (34): > >> configure: Add parameter for configure to enable/disable COLO support > >> migration: Introduce capability 'colo' to migration > >> COLO: migrate colo related info to slave > >> colo-comm/migration: skip colo info section for special cases > >> migration: Add state records for migration incoming > >> migration: Integrate COLO checkpoint process into migration > >> migration: Integrate COLO checkpoint process into loadvm > >> COLO: Implement colo checkpoint protocol > >> COLO: Add a new RunState RUN_STATE_COLO > >> QEMUSizedBuffer: Introduce two help functions for qsb > >> COLO: Save VM state to slave when do checkpoint > >> COLO RAM: Load PVM's dirty page into SVM's RAM cache temporarily > >> COLO VMstate: Load VM state into qsb before restore it > >> arch_init: Start to trace dirty pages of SVM > >> COLO RAM: Flush cached RAM into SVM's memory > >> COLO failover: Introduce a new command to trigger a failover > >> COLO failover: Introduce state to record failover process > >> COLO failover: Implement COLO primary/secondary vm failover work > >> qmp event: Add event notification for COLO error > >> COLO failover: Don't do failover during loading VM's state > >> COLO: Add new command parameter 'forward_nic' 'colo_script' for net > >> COLO NIC: Init/remove colo nic devices when add/cleanup tap devices > >> tap: Make launch_script() public > >> COLO NIC: Implement colo nic device interface configure() > >> colo-nic: Handle secondary VM's original net device configure > >> COLO NIC: Implement colo nic init/destroy function > >> COLO NIC: Some init work related with proxy module > >> COLO: Handle nfnetlink message from proxy module > >> COLO: Do checkpoint according to the result of packets comparation > >> COLO: Improve checkpoint efficiency by do additional periodic > >> checkpoint > >> COLO: Add colo-set-checkpoint-period command > >> COLO NIC: Implement NIC checkpoint and failover > >> COLO: Implement shutdown checkpoint > >> COLO: Add block replication into colo process > >> > >> configure | 33 +- > >> docs/qmp/qmp-events.txt | 16 + > >> hmp-commands.hx | 30 ++ > >> hmp.c | 15 + > >> hmp.h | 2 + > >> include/exec/cpu-all.h | 1 + > >> include/migration/colo.h | 45 +++ > >> include/migration/failover.h | 33 ++ > >> include/migration/migration.h | 19 + > >> include/migration/qemu-file.h | 3 +- > >> include/net/colo-nic.h | 37 ++ > >> include/net/net.h | 2 + > >> include/net/tap.h | 19 + > >> include/sysemu/sysemu.h | 3 + > >> migration/Makefile.objs | 2 + > >> migration/colo-comm.c | 75 ++++ > >> migration/colo-failover.c | 83 +++++ > >> migration/colo.c | 805 > >> ++++++++++++++++++++++++++++++++++++++++++ > >> migration/migration.c | 116 ++++-- > >> migration/qemu-file-buf.c | 58 +++ > >> migration/ram.c | 242 ++++++++++++- > >> migration/savevm.c | 2 +- > >> net/Makefile.objs | 1 + > >> net/colo-nic.c | 457 ++++++++++++++++++++++++ > >> net/net.c | 2 + > >> net/tap.c | 90 +++-- > >> qapi-schema.json | 58 ++- > >> qapi/event.json | 15 + > >> qemu-options.hx | 7 + > >> qmp-commands.hx | 42 +++ > >> scripts/colo-proxy-script.sh | 145 ++++++++ > >> stubs/Makefile.objs | 1 + > >> stubs/migration-colo.c | 58 +++ > >> trace-events | 10 + > >> vl.c | 37 +- > >> 35 files changed, 2474 insertions(+), 90 deletions(-) > >> create mode 100644 include/migration/colo.h > >> create mode 100644 include/migration/failover.h > >> create mode 100644 include/net/colo-nic.h > >> create mode 100644 migration/colo-comm.c > >> create mode 100644 migration/colo-failover.c > >> create mode 100644 migration/colo.c > >> create mode 100644 net/colo-nic.c > >> create mode 100755 scripts/colo-proxy-script.sh > >> create mode 100644 stubs/migration-colo.c > >> > >>-- > >>1.8.3.1 > >> > >> > >-- > >Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK > > > >. > > > > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK