On 03/25/2015 05:50 PM, Juan Quintela wrote: > zhanghailiang <zhang.zhanghaili...@huawei.com> wrote: >> Hi all, >> >> We found that, sometimes, the content of VM's memory is inconsistent between >> Source side and Destination side >> when we check it just after finishing migration but before VM continue to >> Run. >> >> We use a patch like bellow to find this issue, you can find it from affix, >> and Steps to reprduce: >> >> (1) Compile QEMU: >> ./configure --target-list=x86_64-softmmu --extra-ldflags="-lssl" && make >> >> (2) Command and output: >> SRC: # x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu qemu64,-kvmclock >> -netdev tap,id=hn0-device virtio-net-pci,id=net-pci0,netdev=hn0 -boot c >> -drive >> file=/mnt/sdb/pure_IMG/sles/sles11_sp3.img,if=none,id=drive-virtio-disk0,cache=unsafe >> -device >> virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 >> -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor >> stdio > > Could you try to reproduce: > - without vhost > - without virtio-net > - cache=unsafe is going to give you trouble, but trouble should only > happen after migration of pages have finished.
If I use ide disk, it doesn't happen. Even if I use virtio-net with vhost=on, it still doesn't happen. I guess it is because I migrate the guest when it is booting. The virtio net device is not used in this case. Thanks Wen Congyang > > What kind of load were you having when reproducing this issue? > Just to confirm, you have been able to reproduce this without COLO > patches, right? > >> (qemu) migrate tcp:192.168.3.8:3004 >> before saving ram complete >> ff703f6889ab8701e4e040872d079a28 >> md_host : after saving ram complete >> ff703f6889ab8701e4e040872d079a28 >> >> DST: # x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu qemu64,-kvmclock >> -netdev tap,id=hn0,vhost=on -device virtio-net-pci,id=net-pci0,netdev=hn0 >> -boot c -drive >> file=/mnt/sdb/pure_IMG/sles/sles11_sp3.img,if=none,id=drive-virtio-disk0,cache=unsafe >> -device >> virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 >> -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor >> stdio -incoming tcp:0:3004 >> (qemu) QEMU_VM_SECTION_END, after loading ram >> 230e1e68ece9cd4e769630e1bcb5ddfb >> md_host : after loading all vmstate >> 230e1e68ece9cd4e769630e1bcb5ddfb >> md_host : after cpu_synchronize_all_post_init >> 230e1e68ece9cd4e769630e1bcb5ddfb >> >> This happens occasionally, and it is more easy to reproduce when issue >> migration command during VM's startup time. > > OK, a couple of things. Memory don't have to be exactly identical. > Virtio devices in particular do funny things on "post-load". There > aren't warantees for that as far as I know, we should end with an > equivalent device state in memory. > >> We have done further test and found that some pages has been dirtied but its >> corresponding migration_bitmap is not set. >> We can't figure out which modules of QEMU has missed setting bitmap when >> dirty page of VM, >> it is very difficult for us to trace all the actions of dirtying VM's pages. > > This seems to point to a bug in one of the devices. > >> Actually, the first time we found this problem was in the COLO FT >> development, and it triggered some strange issues in >> VM which all pointed to the issue of inconsistent of VM's memory. (We have >> try to save all memory of VM to slave side every time >> when do checkpoint in COLO FT, and everything will be OK.) >> >> Is it OK for some pages that not transferred to destination when do >> migration ? Or is it a bug? > > Pages transferred should be the same, after device state transmission is > when things could change. > >> This issue has blocked our COLO development... :( >> >> Any help will be greatly appreciated! > > Later, Juan. >