Hi This is work in progress on top of the previous migration series just sent.
- Introduces a thread for migration instead of using a timer and callback - remove the writting to the fd from the iothread lock - make the writes synchronous - Introduce a new pending method that returns how many bytes are pending for one save live section - last patch just shows printfs to see where the time is being spent on the migration complete phase. (yes it pollutes all uses of stop on the monitor) So far I have found that we spent a lot of time on bdrv_flush_all() It can take from 1ms to 600ms (yes, it is not a typo). That dwarfs the migration default downtime time (30ms). Stop all vcpus: - it works now (after the changes on qemu_cpu_is_vcpu on the previous series) caveat is that the time that brdv_flush_all() takes is "unpredictable". Any silver bullets? Paolo suggested to call for migration completion phase: bdrv_aio_flush_all(); Sent the dirty pages; bdrv_drain_all() brdv_flush_all() another round through the bitmap in case that completions have changed some page Paolo, did I get it right? Any other suggestion? - migrate_cancel() is not properly implemented (as in the film that we take no locks, ...) - expected_downtime is not calculated. I am about to merge migrate_fd_put_ready & buffered_thread() and that would make trivial to calculate. It outputs something like: wakeup_request 0 time cpu_disable_ticks 0 time pause_all_vcpus 1 time runstate_set 1 time vmstate_notify 2 time bdrv_drain_all 2 time flush device /dev/disk/by-path/ip-192.168.10.200:3260-iscsi-iqn.2010-12.org.trasno:iscsi.lvm-lun-1: 3 time flush device : 3 time flush device : 3 time flush device : 3 time bdrv_flush_all 5 time monitor_protocol_event 5 vm_stop 2 5 synchronize_all_states 1 migrate RAM 37 migrate rest devices 1 complete without error 3a 44 completed 45 end completed stage 45 As you can see, we estimate that we can sent all pending data in 30ms, it took 37ms to send the RAM (that is what we calculate). So estimation is quite good. What it gives me lots of variation is on the line with device name of "time flush device". That is what varies between 1ms to 600ms This is in a completely idle guest. I am running: while (1) { uint64_t delay; if (gettimeofday(&t0, NULL) != 0) perror("gettimeofday 1"); if (usleep(ms2us(10)) != 0) perror("usleep"); if (gettimeofday(&t1, NULL) != 0) perror("gettimeofday 2"); t1.tv_usec -= t0.tv_usec; if (t1.tv_usec < 0) { t1.tv_usec += 1000000; t1.tv_sec--; } t1.tv_sec -= t0.tv_sec; delay = t1.tv_sec * 1000 + t1.tv_usec/1000; if (delay > 100) printf("delay of %ld ms\n", delay); } To see the latency inside the guest (i.e. ask for a 10ms sleep, and see how long it takes). [root@d1 ~]# ./timer delay of 161 ms delay of 135 ms delay of 143 ms delay of 132 ms delay of 131 ms delay of 141 ms delay of 113 ms delay of 119 ms delay of 114 ms But that values are independent of migration. Without even starting the migration, idle guest doing nothing, we get it sometimes. Comments? Thanks, Juan. The following changes since commit 4bce0b88b10ed790ad3669ce4ff61c945cd655eb: cpus: create qemu_cpu_is_vcpu() (2012-09-21 10:43:10 +0200) are available in the git repository at: http://repo.or.cz/r/qemu/quintela.git migration-thread-v3 for you to fetch changes up to 0e0f8dfd9fc308b790e55ceca5c2c193e1802417: migration: print times for end phase (2012-09-21 11:52:20 +0200) Juan Quintela (11): buffered_file: Move from using a timer to use a thread migration: make qemu_fopen_ops_buffered() return void migration: stop all cpus correctly migration: make writes blocking migration: remove unfreeze logic migration: take finer locking buffered_file: Unfold the trick to restart generating migration data buffered_file: don't flush on put buffer buffered_file: unfold buffered_append in buffered_put_buffer savevm: New save live migration method: pending migration: print times for end phase Paolo Bonzini (1): split MRU ram list Umesh Deshpande (2): add a version number to ram_list protect the ramlist with a separate mutex arch_init.c | 62 +++++++++++++------------- block-migration.c | 49 +++++--------------- block.c | 6 +++ buffered_file.c | 130 +++++++++++++++++++++++++----------------------------- buffered_file.h | 2 +- cpu-all.h | 13 +++++- cpus.c | 17 +++++++ exec.c | 43 +++++++++++++++--- migration-exec.c | 2 - migration-fd.c | 6 --- migration-tcp.c | 2 +- migration-unix.c | 2 - migration.c | 108 +++++++++++++++++++-------------------------- migration.h | 4 +- qemu-file.h | 5 --- savevm.c | 37 +++++++++++++--- sysemu.h | 1 + vmstate.h | 1 + 18 files changed, 255 insertions(+), 235 deletions(-) -- 1.7.11.4