Wen Congyang <we...@cn.fujitsu.com> wrote: > On 03/25/2015 05:50 PM, Juan Quintela wrote: >> zhanghailiang <zhang.zhanghaili...@huawei.com> wrote: >>> Hi all, >>> >>> We found that, sometimes, the content of VM's memory is >>> inconsistent between Source side and Destination side >>> when we check it just after finishing migration but before VM continue to >>> Run. >>> >>> We use a patch like bellow to find this issue, you can find it from affix, >>> and Steps to reprduce: >>> >>> (1) Compile QEMU: >>> ./configure --target-list=x86_64-softmmu --extra-ldflags="-lssl" && make >>> >>> (2) Command and output: >>> SRC: # x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu >>> qemu64,-kvmclock -netdev tap,id=hn0-device >>> virtio-net-pci,id=net-pci0,netdev=hn0 -boot c -drive >>> file=/mnt/sdb/pure_IMG/sles/sles11_sp3.img,if=none,id=drive-virtio-disk0,cache=unsafe >>> -device >>> virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 >>> -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet >>> -monitor stdio >> >> Could you try to reproduce: >> - without vhost >> - without virtio-net >> - cache=unsafe is going to give you trouble, but trouble should only >> happen after migration of pages have finished. > > If I use ide disk, it doesn't happen. > Even if I use virtio-net with vhost=on, it still doesn't happen. I guess > it is because I migrate the guest when it is booting. The virtio net > device is not used in this case.
Kevin, Stefan, Michael, any great idea? Thanks, Juan. > > Thanks > Wen Congyang > >> >> What kind of load were you having when reproducing this issue? >> Just to confirm, you have been able to reproduce this without COLO >> patches, right? >> >>> (qemu) migrate tcp:192.168.3.8:3004 >>> before saving ram complete >>> ff703f6889ab8701e4e040872d079a28 >>> md_host : after saving ram complete >>> ff703f6889ab8701e4e040872d079a28 >>> >>> DST: # x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu >>> qemu64,-kvmclock -netdev tap,id=hn0,vhost=on -device >>> virtio-net-pci,id=net-pci0,netdev=hn0 -boot c -drive >>> file=/mnt/sdb/pure_IMG/sles/sles11_sp3.img,if=none,id=drive-virtio-disk0,cache=unsafe >>> -device >>> virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 >>> -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet >>> -monitor stdio -incoming tcp:0:3004 >>> (qemu) QEMU_VM_SECTION_END, after loading ram >>> 230e1e68ece9cd4e769630e1bcb5ddfb >>> md_host : after loading all vmstate >>> 230e1e68ece9cd4e769630e1bcb5ddfb >>> md_host : after cpu_synchronize_all_post_init >>> 230e1e68ece9cd4e769630e1bcb5ddfb >>> >>> This happens occasionally, and it is more easy to reproduce when >>> issue migration command during VM's startup time. >> >> OK, a couple of things. Memory don't have to be exactly identical. >> Virtio devices in particular do funny things on "post-load". There >> aren't warantees for that as far as I know, we should end with an >> equivalent device state in memory. >> >>> We have done further test and found that some pages has been >>> dirtied but its corresponding migration_bitmap is not set. >>> We can't figure out which modules of QEMU has missed setting bitmap >>> when dirty page of VM, >>> it is very difficult for us to trace all the actions of dirtying VM's pages. >> >> This seems to point to a bug in one of the devices. >> >>> Actually, the first time we found this problem was in the COLO FT >>> development, and it triggered some strange issues in >>> VM which all pointed to the issue of inconsistent of VM's >>> memory. (We have try to save all memory of VM to slave side every >>> time >>> when do checkpoint in COLO FT, and everything will be OK.) >>> >>> Is it OK for some pages that not transferred to destination when do >>> migration ? Or is it a bug? >> >> Pages transferred should be the same, after device state transmission is >> when things could change. >> >>> This issue has blocked our COLO development... :( >>> >>> Any help will be greatly appreciated! >> >> Later, Juan. >>