On 11/6/2013 5:04 AM, Juan Quintela wrote:
Hi
[v2]
In this version:
- fixed all the comments from last versions (thanks Eric)
- kvm migration bitmap is synchronized using bitmap operations
- qemu bitmap -> migration bitmap is synchronized using bitmap operations
If bitmaps are not properly aligned, we fall back to old code.
Code survives virt-tests, so should be in quite good shape.
ToDo list:
- vga ram by default is not aligned in a page number multiple of 64,
it could be optimized. Kraxel? It syncs the kvm bitmap at least 1
a second or so? bitmap is only 2048 pages (16MB by default).
We need to change the ram_addr only
- vga: still more, after we finish migration, vga code continues
synchronizing the kvm bitmap on source machine. Notice that there
is no graphics client connected to the VGA. Worth investigating?
- I haven't yet meassure speed differences on big hosts. Vinod?
- Depending of performance, more optimizations to do.
- debugging printf's still on the code, just to see if we are taking
(or not) the optimized paths.
And that is all. Please test & comment.
Thanks, Juan.
[v1]
This series split the dirty bitmap (8 bits per page, only three used)
into 3 individual bitmaps. Once the conversion is done, operations
are handled by bitmap operations, not bit by bit.
- *_DIRTY_FLAG flags are gone, now we use memory.h DIRTY_MEMORY_*
everywhere.
- We set/reset each flag individually
(set_dirty_flags(0xff&~CODE_DIRTY_FLAG)) are gone.
- Rename several functions to clarify/make consistent things.
- I know it dont't pass checkpatch for long lines, propper submission
should pass it. We have to have long lines, short variable names, or
ugly line splitting :p
- DIRTY_MEMORY_NUM: how can one include exec/memory.h into cpu-all.h?
#include it don't work, as a workaround, I have copied its value, but
any better idea? I can always create "exec/migration-flags.h", though.
- The meat of the code is patch 19. Rest of patches are quite easy
(even that one is not too complex).
Only optimizations done so far are
set_dirty_range()/clear_dirty_range() that now operates with
bitmap_set/clear.
Note for Xen: cpu_physical_memory_set_dirty_range() was wrong for xen,
see comment on patch.
It passes virt-test migration tests, so it should be perfect.
I post it to ask for comments.
ToDo list:
- create a lock for the bitmaps and fold migration bitmap into this
one. This would avoid a copy and make things easier?
- As this code uses/abuses bitmaps, we need to change the type of the
index from int to long. With an int index, we can only access a
maximum of 8TB guest (yes, this is not urgent, we have a couple of
years to do it).
- merging KVM <-> QEMU bitmap as a bitmap and not bit-by-bit.
- spliting the KVM bitmap synchronization into chunks, i.e. not
synchronize all memory, just enough to continue with migration.
Any further ideas/needs?
Thanks, Juan.
PD. Why it took so long?
Because I was trying to integrate the bitmap on the MemoryRegion
abstraction. Would have make the code cleaner, but hit dead-end
after dead-end. As practical terms, TCG don't know about
MemoryRegions, it has been ported to run on top of them, but
don't use them effective
The following changes since commit c2d30667760e3d7b81290d801e567d4f758825ca:
rtc: remove dead SQW IRQ code (2013-11-05 20:04:03 -0800)
are available in the git repository at:
git://github.com/juanquintela/qemu.git bitmap-v2.next
for you to fetch changes up to d91eff97e6f36612eb22d57c2b6c2623f73d3997:
migration: synchronize memory bitmap 64bits at a time (2013-11-06 13:54:56
+0100)
----------------------------------------------------------------
Juan Quintela (39):
Move prototypes to memory.h
memory: cpu_physical_memory_set_dirty_flags() result is never used
memory: cpu_physical_memory_set_dirty_range() return void
exec: use accessor function to know if memory is dirty
memory: create function to set a single dirty bit
exec: create function to get a single dirty bit
memory: make cpu_physical_memory_is_dirty return bool
exec: simplify notdirty_mem_write()
memory: all users of cpu_physical_memory_get_dirty used only one flag
memory: set single dirty flags when possible
memory: cpu_physical_memory_set_dirty_range() allways dirty all flags
memory: cpu_physical_memory_mask_dirty_range() always clear a single flag
memory: use DIRTY_MEMORY_* instead of *_DIRTY_FLAG
memory: use bit 2 for migration
memory: make sure that client is always inside range
memory: only resize dirty bitmap when memory size increases
memory: cpu_physical_memory_clear_dirty_flag() result is never used
bitmap: Add bitmap_zero_extend operation
memory: split dirty bitmap into three
memory: unfold cpu_physical_memory_clear_dirty_flag() in its only user
memory: unfold cpu_physical_memory_set_dirty() in its only user
memory: unfold cpu_physical_memory_set_dirty_flag()
memory: make cpu_physical_memory_get_dirty() the main function
memory: cpu_physical_memory_get_dirty() is used as returning a bool
memory: s/mask/clear/ cpu_physical_memory_mask_dirty_range
memory: use find_next_bit() to find dirty bits
memory: cpu_physical_memory_set_dirty_range() now uses bitmap operations
memory: cpu_physical_memory_clear_dirty_range() now uses bitmap
operations
memory: s/dirty/clean/ in cpu_physical_memory_is_dirty()
memory: make cpu_physical_memory_reset_dirty() take a length parameter
memory: cpu_physical_memory_set_dirty_tracking() should return void
memory: split cpu_physical_memory_* functions to its own include
memory: unfold memory_region_test_and_clear()
kvm: use directly cpu_physical_memory_* api for tracking dirty pages
kvm: refactor start address calculation
memory: move bitmap synchronization to its own function
memory: syncronize kvm bitmap using bitmaps operations
ram: split function that synchronizes a range
migration: synchronize memory bitmap 64bits at a time
arch_init.c | 57 ++++++++++++----
cputlb.c | 10 +--
exec.c | 75 ++++++++++-----------
include/exec/cpu-all.h | 4 +-
include/exec/cpu-common.h | 4 --
include/exec/memory-internal.h | 84 ------------------------
include/exec/memory-physical.h | 143 +++++++++++++++++++++++++++++++++++++++++
include/exec/memory.h | 10 +--
include/qemu/bitmap.h | 9 +++
kvm-all.c | 28 ++------
memory.c | 17 ++---
11 files changed, 260 insertions(+), 181 deletions(-)
create mode 100644 include/exec/memory-physical.h
.
Tested-by: Chegu Vinod<chegu_vi...@hp.com>
-------
Hi Juan,
Here are some results from migrating couple of*big fat* guests using TCP
migration and RDMA migration and the last one was with a workload. As one would
expect there were noticeable improvements.
Pl. see below.
FYI
Vinod
------
Migrate speed : 20G
Migrate downtime : 2s
I) Without Juan's bitmap optimization patches : (i.e. current upstream)
Freezes observed during the start and at times during the pre-copy phase.
Longer than expected downtime.
a) 20VCPU/256GB (TCP migration)
A freeze of ~1 second in the guest. (as measured by Juan's timer script)
(qemu) info migrate
capabilities: xbzrle: off x-rdma-pin-all: off auto-converge: off zero-blocks:
off
Migration status: completed
total time: 97048 milliseconds
downtime: 3740 milliseconds
setup: 6912 milliseconds
transferred ram: 5734321 kbytes
throughput: 4243.94 mbps
remaining ram: 0 kbytes
total ram: 268444252 kbytes
duplicate: 65856255 pages
skipped: 0 pages
normal: 1286361 pages
normal bytes: 5145444 kbytes
b) 40VCPU/512GB (TCP migration)
A freeze of ~7 seconds in the guest. (as measured by Juan's timer script)
info migrate
capabilities: xbzrle: off x-rdma-pin-all: off auto-converge: off zero-blocks:
off
Migration status: completed
total time: 238957 milliseconds
downtime: 5700 milliseconds
setup: 14062 milliseconds
transferred ram: 10461990 kbytes
throughput: 4223.74 mbps
remaining ram: 0 kbytes
total ram: 536879712 kbytes
duplicate: 131953694 pages
skipped: 0 pages
normal: 2321019 pages
normal bytes: 9284076 kbytes
------
II) With Juan's v2 bitmap optimization patches :
The actual downtime is lesser/closer to expected value...and that's good !
No multi-second freezes inside the guest during the start of migration or
during the pre-copy phase !
a) 20VCPU/256GB (TCP migration)
(qemu) info migrate
capabilities: xbzrle: off x-rdma-pin-all: off auto-converge: off zero-blocks:
off
Migration status: completed
total time: 84626 milliseconds
downtime: 1893 milliseconds
setup: 296 milliseconds
transferred ram: 5791133 kbytes
throughput: 4841.76 mbps
remaining ram: 0 kbytes
total ram: 268444252 kbytes
duplicate: 65841383 pages
skipped: 0 pages
normal: 1300569 pages
normal bytes: 5202276 kbytes
b) 40VCPU/512GB (TCP migration)
(qemu) info migrate
capabilities: xbzrle: off x-rdma-pin-all: off auto-converge: off zero-blocks:
off
Migration status: completed
total time: 239477 milliseconds
downtime: 1508 milliseconds
setup: 1171 milliseconds
transferred ram: 10584740 kbytes
throughput: 3570.72 mbps
remaining ram: 0 kbytes
total ram: 536879712 kbytes
duplicate: 131934489 pages
skipped: 0 pages
normal: 2351688 pages
normal bytes: 9406752 kbytes
c) 40VCPU/512GB (RDMA migration)
(qemu) info migrate
capabilities: xbzrle: off x-rdma-pin-all: off auto-converge: off zero-blocks:
off
Migration status: completed
total time: 174542 milliseconds
downtime: 1697 milliseconds
setup: 1140 milliseconds
transferred ram: 11739842 kbytes
throughput: 9987.07 mbps
remaining ram: 0 kbytes
total ram: 536879712 kbytes
duplicate: 131722040 pages
skipped: 0 pages
normal: 2902087 pages
normal bytes: 11608348 kbytes
d) 40VCPU/512GB (RDMA migration)
(Guest running SpecJBB2005 24 warehouse threads (guest was ~60% loaded)).
info migrate
capabilities: xbzrle: off x-rdma-pin-all: off auto-converge: off zero-blocks:
off
Migration status: completed
total time: 154236 milliseconds
downtime: 2314 milliseconds
setup: 575 milliseconds
transferred ram: 19198318 kbytes
throughput: 19404.17 mbps
remaining ram: 0 kbytes
total ram: 536879712 kbytes
duplicate: 130647356 pages
skipped: 0 pages
normal: 4766514 pages
normal bytes: 19066056 kbytes