On 2/15/2013 9:46 AM, Paolo Bonzini wrote:
This series does many of the improvements that the migration thread
promised.  It removes buffering, lets a large amount of code run outside
the big QEMU lock, and removes some duplication between incoming and
outgoing migration.

Patches 1 to 7 are simple cleanups.

Patches 8 to 14 simplify the lifecycle of the migration thread and
the migration QEMUFile.

Patches 15 to 18 add fine-grained locking to the block migration
data structures, so that patches 19 to 21 can move RAM/block live
migration out of the big QEMU lock.  At this point blocking writes
will not starve other threads seeking to grab the big QEMU mutex:
patches 22 to 24 removes the buffering and cleanup the code.

Patches 25 to 28 are more cleanups.

Patches 29 to 33 improve QEMUFile so that patches 34 and 35 can
use QEMUFile to write out data, instead of MigrationState.
Patches 36 to 41 then can remove the useless QEMUFile wrapper
that remains.

Please review and test!  You can find these patches at
git://github.com/bonzini/qemu.git, branch migration-thread-20130115.

Juan Quintela (1):
   Rename buffered_ to migration_

Paolo Bonzini (40):
   migration: simplify while loop
   migration: always use vm_stop_force_state
   migration: move more error handling to migrate_fd_cleanup
   migration: push qemu_savevm_state_cancel out of qemu_savevm_state_*
   block-migration: remove useless calls to blk_mig_cleanup
   qemu-file: pass errno from qemu_fflush via f->last_error
   migration: use qemu_file_set_error to pass error codes back to
     qemu_savevm_state
   qemu-file: temporarily expose qemu_file_set_error and qemu_fflush
   migration: flush all data to fd when buffered_flush is called
   migration: use qemu_file_set_error
   migration: simplify error handling
   migration: do not nest flushing of device data
   migration: prepare to access s->state outside critical sections
   migration: cleanup migration (including thread) in the iothread
   block-migration: remove variables that are never read
   block-migration: small preparatory changes for locking
   block-migration: document usage of state across threads
   block-migration: add lock
   migration: reorder SaveVMHandlers members
   migration: run pending/iterate callbacks out of big lock
   migration: run setup callbacks out of big lock
   migration: yay, buffering is gone
   qemu-file: make qemu_fflush and qemu_file_set_error private again
   migration: eliminate last_round
   migration: detect error before sleeping
   migration: remove useless qemu_file_get_error check
   migration: use qemu_file_rate_limit consistently
   migration: merge qemu_popen_cmd with qemu_popen
   qemu-file: fsync a writable stdio QEMUFile
   qemu-file: check exit status when closing a pipe QEMUFile
   qemu-file: add writable socket QEMUFile
   qemu-file: simplify and export qemu_ftell
   migration: use QEMUFile for migration channel lifetime
   migration: use QEMUFile for writing outgoing migration data
   migration: use qemu_ftell to compute bandwidth
   migration: small changes around rate-limiting
   migration: move rate limiting to QEMUFile
   migration: move contents of migration_close to migrate_fd_cleanup
   migration: eliminate s->migration_file
   migration: inline migrate_fd_close

  arch_init.c                   |   14 ++-
  block-migration.c             |  167 +++++++++++++++------
  docs/migration.txt            |   20 +---
  include/migration/migration.h |   12 +--
  include/migration/qemu-file.h |   21 +--
  include/migration/vmstate.h   |   21 ++-
  include/qemu/atomic.h         |    1 +
  include/sysemu/sysemu.h       |    6 +-
  migration-exec.c              |   39 +-----
  migration-fd.c                |   47 +------
  migration-tcp.c               |   33 +----
  migration-unix.c              |   33 +----
  migration.c                   |  345 ++++++++---------------------------------
  savevm.c                      |  214 +++++++++++++++-----------
  util/osdep.c                  |    6 +-
  15 files changed, 367 insertions(+), 612 deletions(-)

.

'am still in the midst of reviewing the changes but gave them a try. The following are my preliminary observations :

- The mult-second freezes at the start of migration of larger guests (i.e. 128GB and higher) aren't observable with the above changes. (The simple timer script that does a gettimeofday every 100ms didn't complain about delays etc.).

- Noticed improvements in bandwidth utilization during the iterative pre-copy phase and during the "downtime" phase.

- The total migration time reduced...more for larger guests (Note: The undesirably large actual "downtime" for larger guests is a different topic that still needs to be addressed independent of these changes).

Some details follow below...

Thanks
Vinod


Details:
----------

Host and Guest kernels are running : 3.8-rc5.

Comparing upstream (Qemu 1.4.50) vs. Paolo's branch(Qemu 1.3.92 based) i.e.
git clone git://github.com/bonzini/qemu.git -b migration-thread-20130115

First set of experiments are with [not-so-interesting] *Idle* guests of different sizes.
The second experiment was with an OLTP workload.

A) Idle guests:
--------------------
(The migration speed was set to 10G and the downtime was set to 2)

1) 5vcpu/32G  - *idle* guest

QEMU 1.4.50:
total time: 31801 milliseconds
downtime: 2831 milliseconds

Paolo's branch:
total time: 29012 milliseconds
downtime: 1987 milliseconds

--
2) 10vcpu/64G - *idle* guest

QEMU 1.4.50:
total time: 62699 milliseconds
downtime: 2506 milliseconds

Paolo's branch:
total time: 59174 milliseconds
downtime: 2451 milliseconds

--
3) 10vcpu/128G  - *idle* guest

QEMU 1.4.50:
total time: 123179 milliseconds
downtime: 2566 milliseconds

[root@h11-kvm1 ~]# ./timer
delay of 3083 ms <- freeze (@start of migration)
delay of 1916 ms                   <- freeze (due to downtime)

Paolo's branch:
total time: 116809 milliseconds
downtime: 2703 milliseconds

[root@h11-kvm1 ~]# ./timer
delay of 2820 ms                 <- freeze (due to downtime)

--
4) 20vcpu/256G - *idle* guest

QEMU 1.4.50:
total time: 277775 milliseconds
downtime: 3718 milliseconds

[root@h11-kvm1 ~]# ./timer
delay of 6317 ms <- freeze (@ start of migration)
delay of 2952 ms                 <- freeze (due to downtime)

Paolo's branch:
total time: 261790 milliseconds
downtime: 3809 milliseconds

[root@h11-kvm1 ~]# ./timer
delay of 3982 ms            <-  freeze (due to downtime)

--
5) 40vcpu/512G - *idle* guest

QEMU 1.4.50:
total time: 631654 milliseconds
downtime: 7252 milliseconds

[root@h11-kvm1 ~]# ./timer
delay of 12713 ms              <- freeze (@ start of migration)
delay of 6099 ms                <- freeze (due to downtime)

Paolo's branch:
total time: 603252 milliseconds
downtime: 6452 milliseconds

[root@h11-kvm1 ~]# ./timer
delay of 6724 ms              <- freeze (due to downtime)

--
6) 80vcpu/784G - *idle* guest

QEMU 1.4.50:
total time: 1003210 milliseconds
downtime: 8932 milliseconds

[root@h11-kvm1 ~]# ./timer
delay of 18941 ms <- freeze (@ start of migration.)
delay of 8395 ms                 <- freeze (due to downtime)
delay of 2451 ms <- freeze (on new host...why?)

Paolo's branch:
total time: 959378 milliseconds
downtime: 8416 milliseconds

[root@h11-kvm1 ~]# ./timer
delay of 8938 ms                       <- freeze (due to downtime)
delay of 935 ms <- freeze (on new host...why?)

-------

B) Guest with an OLTP workload :
---------------------------------------------

Guest : 80vcpu / 784GB (yes i know that typical guests sizes today aren't this huge...but this is just an experiment keeping in mind that guests are continuing to get fatter)

OLTP workload with 100 users doing writes/reads. Using tmpfs...as I don't yet have access to real I/O :-(

Host was ~70% busy and the guest was ~60% busy.

The migration speed was set to 10G and the downtime was set to 4s.

No guest freezes observed but there were significant drops in the TPS at the start of migration etc. Observed about ~30-40% improvement in the bandwidth utilization during the iterative pre-copy phase.

The workload did NOT converge even after 30 mins or so...with either upstream qemu or with Paolo's changes (Note: the lack of convergence issue needs to be pursued separately...based on ideas proposed in the past).

Reply via email to