* Li Zhijian (lizhij...@cn.fujitsu.com) wrote: > Hi all, > > Does anyboday remember the similar issue post by hailiang months ago > http://patchwork.ozlabs.org/patch/454322/ > At least tow bugs about migration had been fixed since that.
Yes, I wondered what happened to that. > And now we found the same issue at the tcg vm(kvm is fine), after migration, > the content VM's memory is inconsistent. Hmm, TCG only - I don't know much about that; but I guess something must be accessing memory without using the proper macros/functions so it doesn't mark it as dirty. > we add a patch to check memory content, you can find it from affix > > steps to reporduce: > 1) apply the patch and re-build qemu > 2) prepare the ubuntu guest and run memtest in grub. > soruce side: > x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device > e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive > if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device > virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 > -vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp > tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine > pc-i440fx-2.3,accel=tcg,usb=off > > destination side: > x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device > e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive > if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device > virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 > -vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp > tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine > pc-i440fx-2.3,accel=tcg,usb=off -incoming tcp:0:8881 > > 3) start migration > with 1000M NIC, migration will finish within 3 min. > > at source: > (qemu) migrate tcp:192.168.2.66:8881 > after saving ram complete > e9e725df678d392b1a83b3a917f332bb > qemu-system-x86_64: end ram md5 > (qemu) > > at destination: > ...skip... > Completed load of VM with exit code 0 seq iteration 1264 > Completed load of VM with exit code 0 seq iteration 1265 > Completed load of VM with exit code 0 seq iteration 1266 > qemu-system-x86_64: after loading state section id 2(ram) > 49c2dac7bde0e5e22db7280dcb3824f9 > qemu-system-x86_64: end ram md5 > qemu-system-x86_64: qemu_loadvm_state: after cpu_synchronize_all_post_init > > 49c2dac7bde0e5e22db7280dcb3824f9 > qemu-system-x86_64: end ram md5 > > This occurs occasionally and only at tcg machine. It seems that > some pages dirtied in source side don't transferred to destination. > This problem can be reproduced even if we disable virtio. > > Is it OK for some pages that not transferred to destination when do > migration ? Or is it a bug? I'm pretty sure that means it's a bug. Hard to find though, I guess at least memtest is smaller than a big OS. I think I'd dump the whole of memory on both sides, hexdump and diff them - I'd guess it would just be one byte/word different, maybe that would offer some idea what wrote it. Dave > Any idea... > > =================md5 check patch============================= > > diff --git a/Makefile.target b/Makefile.target > index 962d004..e2cb8e9 100644 > --- a/Makefile.target > +++ b/Makefile.target > @@ -139,7 +139,7 @@ obj-y += memory.o cputlb.o > obj-y += memory_mapping.o > obj-y += dump.o > obj-y += migration/ram.o migration/savevm.o > -LIBS := $(libs_softmmu) $(LIBS) > +LIBS := $(libs_softmmu) $(LIBS) -lplumb > > # xen support > obj-$(CONFIG_XEN) += xen-common.o > diff --git a/migration/ram.c b/migration/ram.c > index 1eb155a..3b7a09d 100644 > --- a/migration/ram.c > +++ b/migration/ram.c > @@ -2513,7 +2513,7 @@ static int ram_load(QEMUFile *f, void *opaque, int > version_id) > } > > rcu_read_unlock(); > - DPRINTF("Completed load of VM with exit code %d seq iteration " > + fprintf(stderr, "Completed load of VM with exit code %d seq iteration " > "%" PRIu64 "\n", ret, seq_iter); > return ret; > } > diff --git a/migration/savevm.c b/migration/savevm.c > index 0ad1b93..3feaa61 100644 > --- a/migration/savevm.c > +++ b/migration/savevm.c > @@ -891,6 +891,29 @@ void qemu_savevm_state_header(QEMUFile *f) > > } > > +#include "exec/ram_addr.h" > +#include "qemu/rcu_queue.h" > +#include <clplumbing/md5.h> > +#ifndef MD5_DIGEST_LENGTH > +#define MD5_DIGEST_LENGTH 16 > +#endif > + > +static void check_host_md5(void) > +{ > + int i; > + unsigned char md[MD5_DIGEST_LENGTH]; > + rcu_read_lock(); > + RAMBlock *block = QLIST_FIRST_RCU(&ram_list.blocks);/* Only check > 'pc.ram' block */ > + rcu_read_unlock(); > + > + MD5(block->host, block->used_length, md); > + for(i = 0; i < MD5_DIGEST_LENGTH; i++) { > + fprintf(stderr, "%02x", md[i]); > + } > + fprintf(stderr, "\n"); > + error_report("end ram md5"); > +} > + > void qemu_savevm_state_begin(QEMUFile *f, > const MigrationParams *params) > { > @@ -1056,6 +1079,10 @@ void qemu_savevm_state_complete_precopy(QEMUFile *f, > bool iterable_only) > save_section_header(f, se, QEMU_VM_SECTION_END); > > ret = se->ops->save_live_complete_precopy(f, se->opaque); > + > + fprintf(stderr, "after saving %s complete\n", se->idstr); > + check_host_md5(); > + > trace_savevm_section_end(se->idstr, se->section_id, ret); > save_section_footer(f, se); > if (ret < 0) { > @@ -1791,6 +1818,11 @@ static int qemu_loadvm_state_main(QEMUFile *f, > MigrationIncomingState *mis) > section_id, le->se->idstr); > return ret; > } > + if (section_type == QEMU_VM_SECTION_END) { > + error_report("after loading state section id %d(%s)", > + section_id, le->se->idstr); > + check_host_md5(); > + } > if (!check_section_footer(f, le)) { > return -EINVAL; > } > @@ -1901,6 +1933,8 @@ int qemu_loadvm_state(QEMUFile *f) > } > > cpu_synchronize_all_post_init(); > + error_report("%s: after cpu_synchronize_all_post_init\n", __func__); > + check_host_md5(); > > return ret; > } > > > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK