* zhanghailiang (zhang.zhanghaili...@huawei.com) wrote: > On 2015/3/30 15:59, Dr. David Alan Gilbert wrote: > >* zhanghailiang (zhang.zhanghaili...@huawei.com) wrote: > >>On 2015/3/27 18:18, Dr. David Alan Gilbert wrote: > >>>* zhanghailiang (zhang.zhanghaili...@huawei.com) wrote: > >>>>On 2015/3/26 11:52, Li Zhijian wrote: > >>>>>On 03/26/2015 11:12 AM, Wen Congyang wrote: > >>>>>>On 03/25/2015 05:50 PM, Juan Quintela wrote: > >>>>>>>zhanghailiang<zhang.zhanghaili...@huawei.com> wrote: > >>>>>>>>Hi all, > >>>>>>>> > >>>>>>>>We found that, sometimes, the content of VM's memory is inconsistent > >>>>>>>>between Source side and Destination side > >>>>>>>>when we check it just after finishing migration but before VM > >>>>>>>>continue to Run. > >>>>>>>> > >>>>>>>>We use a patch like bellow to find this issue, you can find it from > >>>>>>>>affix, > >>>>>>>>and Steps to reprduce: > >>>>>>>> > >>>>>>>>(1) Compile QEMU: > >>>>>>>> ./configure --target-list=x86_64-softmmu --extra-ldflags="-lssl" > >>>>>>>> && make > >>>>>>>> > >>>>>>>>(2) Command and output: > >>>>>>>>SRC: # x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu > >>>>>>>>qemu64,-kvmclock -netdev tap,id=hn0-device > >>>>>>>>virtio-net-pci,id=net-pci0,netdev=hn0 -boot c -drive > >>>>>>>>file=/mnt/sdb/pure_IMG/sles/sles11_sp3.img,if=none,id=drive-virtio-disk0,cache=unsafe > >>>>>>>> -device > >>>>>>>>virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 > >>>>>>>> -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet > >>>>>>>>-monitor stdio > >>>>>>>Could you try to reproduce: > >>>>>>>- without vhost > >>>>>>>- without virtio-net > >>>>>>>- cache=unsafe is going to give you trouble, but trouble should only > >>>>>>> happen after migration of pages have finished. > >>>>>>If I use ide disk, it doesn't happen. > >>>>>>Even if I use virtio-net with vhost=on, it still doesn't happen. I guess > >>>>>>it is because I migrate the guest when it is booting. The virtio net > >>>>>>device is not used in this case. > >>>>>Er?????? > >>>>>it reproduces in my ide disk > >>>>>there is no any virtio device, my command line like below > >>>>> > >>>>>x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu qemu64,-kvmclock -net > >>>>>none > >>>>>-boot c -drive file=/home/lizj/ubuntu.raw -vnc :7 -m 2048 -smp 2 -machine > >>>>>usb=off -no-user-config -nodefaults -monitor stdio -vga std > >>>>> > >>>>>it seems easily to reproduce this issue by following steps in _ubuntu_ > >>>>>guest > >>>>>1. in source side, choose memtest in grub > >>>>>2. do live migration > >>>>>3. exit memtest(type Esc in when memory testing) > >>>>>4. wait migration complete > >>>>> > >>>> > >>>>Yes???it is a thorny problem. It is indeed easy to reproduce, just as > >>>>your steps in the above. > >>>> > >>>>This is my test result: (I also test accel=tcg, it can be reproduced > >>>>also.) > >>>>Source side: > >>>># x86_64-softmmu/qemu-system-x86_64 -machine > >>>>pc-i440fx-2.3,accel=kvm,usb=off -no-user-config -nodefaults -cpu > >>>>qemu64,-kvmclock -boot c -drive > >>>>file=/mnt/sdb/pure_IMG/ubuntu/ubuntu_14.04_server_64_2U_raw -device > >>>>cirrus-vga,id=video0,vgamem_mb=8 -vnc :7 -m 2048 -smp 2 -monitor stdio > >>>>(qemu) ACPI_BUILD: init ACPI tables > >>>>ACPI_BUILD: init ACPI tables > >>>>migrate tcp:9.61.1.8:3004 > >>>>ACPI_BUILD: init ACPI tables > >>>>before cpu_synchronize_all_states > >>>>5a8f72d66732cac80d6a0d5713654c0e > >>>>md_host : before saving ram complete > >>>>5a8f72d66732cac80d6a0d5713654c0e > >>>>md_host : after saving ram complete > >>>>5a8f72d66732cac80d6a0d5713654c0e > >>>>(qemu) > >>>> > >>>>Destination side: > >>>># x86_64-softmmu/qemu-system-x86_64 -machine > >>>>pc-i440fx-2.3,accel=kvm,usb=off -no-user-config -nodefaults -cpu > >>>>qemu64,-kvmclock -boot c -drive > >>>>file=/mnt/sdb/pure_IMG/ubuntu/ubuntu_14.04_server_64_2U_raw -device > >>>>cirrus-vga,id=video0,vgamem_mb=8 -vnc :7 -m 2048 -smp 2 -monitor stdio > >>>>-incoming tcp:0:3004 > >>>>(qemu) QEMU_VM_SECTION_END, after loading ram > >>>>d7cb0d8a4bdd1557fb0e78baee50c986 > >>>>md_host : after loading all vmstate > >>>>d7cb0d8a4bdd1557fb0e78baee50c986 > >>>>md_host : after cpu_synchronize_all_post_init > >>>>d7cb0d8a4bdd1557fb0e78baee50c986 > >>> > >>>Hmm, that's not good. I suggest you md5 each of the RAMBlock's > >>>individually; > >>>to see if it's main RAM that's different or something more subtle like > >>>video RAM. > >>> > >> > >>Er, all my previous tests are md5 'pc.ram' block only. > >> > >>>But then maybe it's easier just to dump the whole of RAM to file > >>>and byte compare it (hexdump the two dumps and diff ?) > >> > >>Hmm, we also used memcmp function to compare every page, but the addresses > >>seem to be random. > >> > >>Besides, in our previous test, we found it seems to be more easy to > >>reproduce > >>when migration occurs during VM's start-up or reboot process. > >> > >>Is there any possible that some devices have special treatment when VM > >>start-up > >>which may miss setting dirty-bitmap ? > > > >I don't think there should be, but the code paths used during startup are > >probably much less tested with migration. I'm sure the startup code > >uses different part of device emulation. I do know we have some bugs > > Er, Maybe there is a special case: > > During VM's start-up, i found that the KVMslot changed many times, it was a > process of > smashing total memory space into small slot. > > If some pages was dirtied and its dirty-bitmap has been set in KVM module, > but we didn't sync the bitmaps to QEMU user-space before this slot was > smashed, > with its previous bitmap been destroyed. > The bitmap of dirty pages in the previous KVMslot maybe be missed. > > What's your opinion? Can this situation i described in the above happen? > > The bellow log was grabbed, when i tried to figure out a quite same question > (some pages miss dirty-bitmap setting) we found in COLO: > Occasionally, there will be an error report in SLAVE side: > > qemu: warning: error while loading state for instance 0x0 of device > 'kvm-tpr-opt' ' > qemu-system-x86_64: loadvm failed > > We found that it related to three address (gpa: 0xca000,0xcb000,0xcc000, > which are the address of 'kvmvapic.rom ?'), and sometimes > their corresponding dirty-map will be missed in Master side, because their > KVMSlot is destroyed before we sync its dirty-bitmap to qemu. > > (I'm still not quite sure if this can also happen in common migration, i will > try to test it in normal migration)
Ah, kvm-tpr-opt. I also hit this today with colo; it didn't happen on a new machine, but it happens on an older machine, or in this case a nested setup. -global kvm-apic.vapic=false Turns the vapic off, and seems to have got me further. I don't understand the details of kvm-tpr-opt; but it's a hack for speeding up access to the 'tpr' (Task priority register) which apparently Windows does a lot and used to slow things down; the hack does involve modifying some stuff on the fly, and it doesn't happen if the host has some new features (tpr shadow I think also look for flexpriority). Dave > > Thanks, > zhanghailiang > > /var/log/message: > Mar 31 18:32:53 master kernel: [15908.524721] memslot dirty bitmap of '4' was > destroyed, base_gfn 0x100, npages 524032 > Mar 31 18:32:53 master kernel: [15908.657853] +++ gfn=0xcc is marked dirty, > id=2, base_gfn=0xc0, npages=524096 > Mar 31 18:32:53 master kernel: [15908.665105] +++ gfn=0xcb is marked dirty, > id=2, base_gfn=0xc0, npages=524096 > Mar 31 18:32:53 master kernel: [15908.672360] +++ gfn=0xca is marked dirty, > id=2, base_gfn=0xc0, npages=524096 > Mar 31 18:32:53 master kernel: [15908.682058] memslot dirty bitmap of '4' was > destroyed, base_gfn 0xfebc0, npages 16 > Mar 31 18:32:53 master kernel: [15908.849527] +++ gfn=0xca is marked dirty, > id=2, base_gfn=0xc0, npages=524096 > Mar 31 18:32:53 master kernel: [15908.856845] +++ gfn=0xcb is marked dirty, > id=2, base_gfn=0xc0, npages=524096 > Mar 31 18:32:53 master kernel: [15908.864161] +++ gfn=0xcc is marked dirty, > id=2, base_gfn=0xc0, npages=524096 > Mar 31 18:32:53 master kernel: [15908.872676] memslot dirty bitmap of '4' was > destroyed, base_gfn 0xfeb80, npages 64 > Mar 31 18:32:53 master kernel: [15908.882694] +++ gfn=0xca is marked dirty, > id=2, base_gfn=0xc0, npages=524096 > Mar 31 18:32:53 master kernel: [15908.889948] +++ gfn=0xcb is marked dirty, > id=2, base_gfn=0xc0, npages=524096 > Mar 31 18:32:53 master kernel: [15908.897202] +++ gfn=0xcc is marked dirty, > id=2, base_gfn=0xc0, npages=524096 > Mar 31 18:32:53 master kernel: [15908.911652] +++ gfn=0xca is marked dirty, > id=2, base_gfn=0xc0, npages=524096 > Mar 31 18:32:53 master kernel: [15908.919543] +++ gfn=0xca is marked dirty, > id=2, base_gfn=0xc0, npages=524096 > Mar 31 18:32:53 master kernel: [15908.927419] +++ gfn=0xcb is marked dirty, > id=2, base_gfn=0xc0, npages=524096 > Mar 31 18:32:53 master kernel: [15908.935317] +++ gfn=0xcb is marked dirty, > id=2, base_gfn=0xc0, npages=524096 > Mar 31 18:32:53 master kernel: [15908.943209] +++ gfn=0xcb is marked dirty, > id=2, base_gfn=0xc0, npages=524096 > Mar 31 18:32:53 master kernel: [15908.951083] +++ gfn=0xcb is marked dirty, > id=2, base_gfn=0xc0, npages=524096 > Mar 31 18:32:53 master kernel: [15908.958971] +++ gfn=0xcc is marked dirty, > id=2, base_gfn=0xc0, npages=524096 > Mar 31 18:32:53 master kernel: [15908.966837] +++ gfn=0xcc is marked dirty, > id=2, base_gfn=0xc0, npages=524096 > Mar 31 18:32:53 master kernel: [15908.974707] +++ gfn=0xcc is marked dirty, > id=2, base_gfn=0xc0, npages=524096 > Mar 31 18:32:53 master kernel: [15908.988470] memslot dirty bitmap of '2' was > destroyed, base_gfn 0xc0, npages 524096 > Mar 31 18:32:53 master kernel: [15909.002403] kvm: zapping shadow pages for > mmio generation wraparound > Mar 31 18:32:53 master kernel: [15909.002523] memslot dirty bitmap of '2' was > destroyed, base_gfn 0xc0, npages 8 > Mar 31 18:32:53 master kernel: [15909.010110] memslot dirty bitmap of '4' was > destroyed, base_gfn 0xc8, npages 524088 > Mar 31 18:32:53 master kernel: [15909.023988] memslot dirty bitmap of '5' was > destroyed, base_gfn 0xcd, npages 3 > Mar 31 18:32:53 master kernel: [15909.031594] memslot dirty bitmap of '6' was > destroyed, base_gfn 0xd0, npages 524080 > Mar 31 18:32:53 master kernel: [15909.044708] memslot dirty bitmap of '5' was > destroyed, base_gfn 0xcd, npages 11 > Mar 31 18:32:53 master kernel: [15909.052392] memslot dirty bitmap of '6' was > destroyed, base_gfn 0xd8, npages 524072 > Mar 31 18:32:53 master kernel: [15909.065651] memslot dirty bitmap of '5' was > destroyed, base_gfn 0xcd, npages 19 > Mar 31 18:32:53 master kernel: [15909.073329] memslot dirty bitmap of '6' was > destroyed, base_gfn 0xe0, npages 524064 > Mar 31 18:32:53 master kernel: [15909.086379] memslot dirty bitmap of '5' was > destroyed, base_gfn 0xcd, npages 27 > Mar 31 18:32:53 master kernel: [15909.094084] memslot dirty bitmap of '6' was > destroyed, base_gfn 0xe8, npages 524056 > Mar 31 18:32:53 master kernel: [15909.107354] memslot dirty bitmap of '6' was > destroyed, base_gfn 0xec, npages 524052 > Mar 31 18:32:54 master dhcpcd[5408]: eth0: timed out > Mar 31 18:32:54 master dhcpcd[5408]: eth0: trying to use old lease in > `/var/lib/dhcpcd/dhcpcd-eth0.info' > Mar 31 18:32:54 master dhcpcd[5408]: eth0: lease expired 30265 seconds ago > Mar 31 18:32:54 master dhcpcd[5408]: eth0: broadcasting for a lease > Mar 31 18:33:02 master qemu-system-x86_64: ====qemu: The 0 times do > checkpoing=== > Mar 31 18:33:02 master qemu-system-x86_64: == migration_bitmap_sync == > Mar 31 18:33:02 master qemu-system-x86_64: qemu: --- addr=0xca000 > (hva=0x7f4fa32ca000), who's bitmap not set --- > Mar 31 18:33:02 master qemu-system-x86_64: qemu: --- addr=0xcb000 > (hva=0x7f4fa32cb000), who's bitmap not set --- > Mar 31 18:33:02 master qemu-system-x86_64: qemu: --- addr=0xcc000 > (hva=0x7f4fa32cc000), who's bitmap not set --- > Mar 31 18:33:05 master kernel: [15921.057246] device eth2 left promiscuous > mode > Mar 31 18:33:07 master avahi-daemon[5773]: Withdrawing address record for > fe80::90be:cfff:fe9e:f03e on tap0. > Mar 31 18:33:07 master kernel: [15922.513313] br0: port 2(tap0) entered > disabled state > Mar 31 18:33:07 master kernel: [15922.513480] device tap0 left promiscuous > mode > Mar 31 18:33:07 master kernel: [15922.513508] br0: port 2(tap0) entered > disabled state > Mar 31 18:33:07 master kernel: [15922.562591] memslot dirty bitmap of '8' was > destroyed, base_gfn 0x100, npages 524032 > Mar 31 18:33:07 master kernel: [15922.570948] memslot dirty bitmap of '3' was > destroyed, base_gfn 0xfd000, npages 4096 > Mar 31 18:33:07 master kernel: [15922.578967] memslot dirty bitmap of '0' was > destroyed, base_gfn 0x0, npages 160 > Mar 31 18:33:07 master kernel: [15922.586556] memslot dirty bitmap of '1' was > destroyed, base_gfn 0xfffc0, npages 64 > Mar 31 18:33:07 master kernel: [15922.586574] memslot dirty bitmap of '5' was > destroyed, base_gfn 0xcd, npages 31 > Mar 31 18:33:07 master kernel: [15922.586575] memslot dirty bitmap of '7' was > destroyed, base_gfn 0xf0, npages 16 > Mar 31 18:33:07 master kernel: [15922.586577] memslot dirty bitmap of '2' was > destroyed, base_gfn 0xc0, npages 10 > Mar 31 18:33:07 master kernel: [15922.586578] memslot dirty bitmap of '6' was > destroyed, base_gfn 0xec, npages 4 > Mar 31 18:33:07 master kernel: [15922.586579] memslot dirty bitmap of '4' was > destroyed, base_gfn 0xca, npages 3 > > PS: > QEMU: > static void migration_bitmap_sync(void) > { > trace_migration_bitmap_sync_end(migration_dirty_pages > - num_dirty_pages_init); > num_dirty_pages_period += migration_dirty_pages - num_dirty_pages_init; > end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME); > + syslog(LOG_INFO, "== migration_bitmap_sync =="); > + check_bitmap((); > > +void check_bitmap(void) > +{ > + int i; > + char *host; > + ram_addr_t addr[3] = { 0xca000, 0xcb000, 0xcc000 }; > + RAMBlock *block = NULL; > + int ret; > + > + block = QLIST_FIRST_RCU(&ram_list.blocks); > + for (i = 0; i < 3; i++) { > + unsigned long nr = block->mr->ram_addr + (addr[i] >> > TARGET_PAGE_BITS); > + host = memory_region_get_ram_ptr(block->mr) + addr[i]; > + ret = test_bit(nr, migration_bitmap); > + if (ret == 0) { > + syslog(LOG_INFO, "qemu: --- addr=0x%llx (hva=%p), who's bitmap > not set ---\n", > + addr[i], host); > + } else { > + syslog(LOG_INFO, "qemu: +++ OK, addr=0x%llx (hva=%p) , who's > bitap is set +++\n", > + addr[i], host);; > + } > + } > +} > > KVM??? > static void mark_page_dirty_in_slot(struct kvm *kvm, > struct kvm_memory_slot *memslot, > gfn_t gfn) > { > + if ((gfn == 0xca || gfn == 0xcb || gfn == 0xcc) && > !memslot->dirty_bitmap) { > + printk("oops, dirty_bitmap is null, gfn=0x%llx\n",gfn); > + } > if (memslot && memslot->dirty_bitmap) { > unsigned long rel_gfn = gfn - memslot->base_gfn; > + if (gfn == 0xca || gfn == 0xcb || gfn == 0xcc) { > + printk("+++ gfn=0x%llx is marked dirty, id=%d, base_gfn=0x%llx, > npages=%d\n", > + gfn, memslot->id, memslot->base_gfn, memslot->npages); > + } > set_bit_le(rel_gfn, memslot->dirty_bitmap); > } > } > > static void kvm_destroy_dirty_bitmap(struct kvm_memory_slot *memslot) > { > if (!memslot->dirty_bitmap) > return; > + if (memslot->id != 9) { > + printk("memslot dirty bitmap of '%d' was destroyed, base_gfn 0x%llx, > npages %lld\n", > + memslot->id, memslot->base_gfn, memslot->npages); > + } > kvm_kvfree(memslot->dirty_bitmap); > memslot->dirty_bitmap = NULL; > } > > > >filed against migration during windows boot, I'd not considered that it might > >be devices not updating the bitmap. > > > >Dave > > > >> > >> > >>Thanks, > >>zhanghailiang > >> > >> > >>>>>> > >>>>>>>What kind of load were you having when reproducing this issue? > >>>>>>>Just to confirm, you have been able to reproduce this without COLO > >>>>>>>patches, right? > >>>>>>> > >>>>>>>>(qemu) migrate tcp:192.168.3.8:3004 > >>>>>>>>before saving ram complete > >>>>>>>>ff703f6889ab8701e4e040872d079a28 > >>>>>>>>md_host : after saving ram complete > >>>>>>>>ff703f6889ab8701e4e040872d079a28 > >>>>>>>> > >>>>>>>>DST: # x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu > >>>>>>>>qemu64,-kvmclock -netdev tap,id=hn0,vhost=on -device > >>>>>>>>virtio-net-pci,id=net-pci0,netdev=hn0 -boot c -drive > >>>>>>>>file=/mnt/sdb/pure_IMG/sles/sles11_sp3.img,if=none,id=drive-virtio-disk0,cache=unsafe > >>>>>>>> -device > >>>>>>>>virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 > >>>>>>>> -vnc :7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet > >>>>>>>>-monitor stdio -incoming tcp:0:3004 > >>>>>>>>(qemu) QEMU_VM_SECTION_END, after loading ram > >>>>>>>>230e1e68ece9cd4e769630e1bcb5ddfb > >>>>>>>>md_host : after loading all vmstate > >>>>>>>>230e1e68ece9cd4e769630e1bcb5ddfb > >>>>>>>>md_host : after cpu_synchronize_all_post_init > >>>>>>>>230e1e68ece9cd4e769630e1bcb5ddfb > >>>>>>>> > >>>>>>>>This happens occasionally, and it is more easy to reproduce when > >>>>>>>>issue migration command during VM's startup time. > >>>>>>>OK, a couple of things. Memory don't have to be exactly identical. > >>>>>>>Virtio devices in particular do funny things on "post-load". There > >>>>>>>aren't warantees for that as far as I know, we should end with an > >>>>>>>equivalent device state in memory. > >>>>>>> > >>>>>>>>We have done further test and found that some pages has been dirtied > >>>>>>>>but its corresponding migration_bitmap is not set. > >>>>>>>>We can't figure out which modules of QEMU has missed setting bitmap > >>>>>>>>when dirty page of VM, > >>>>>>>>it is very difficult for us to trace all the actions of dirtying VM's > >>>>>>>>pages. > >>>>>>>This seems to point to a bug in one of the devices. > >>>>>>> > >>>>>>>>Actually, the first time we found this problem was in the COLO FT > >>>>>>>>development, and it triggered some strange issues in > >>>>>>>>VM which all pointed to the issue of inconsistent of VM's memory. (We > >>>>>>>>have try to save all memory of VM to slave side every time > >>>>>>>>when do checkpoint in COLO FT, and everything will be OK.) > >>>>>>>> > >>>>>>>>Is it OK for some pages that not transferred to destination when do > >>>>>>>>migration ? Or is it a bug? > >>>>>>>Pages transferred should be the same, after device state transmission > >>>>>>>is > >>>>>>>when things could change. > >>>>>>> > >>>>>>>>This issue has blocked our COLO development... :( > >>>>>>>> > >>>>>>>>Any help will be greatly appreciated! > >>>>>>>Later, Juan. > >>>>>>> > >>>>>>. > >>>>>> > >>>>> > >>>>> > >>>> > >>>> > >>>-- > >>>Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK > >>> > >>>. > >>> > >> > >> > >-- > >Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK > > > >. > > > > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK