On 09/14/2011 08:18 AM, Thomas Treutner wrote:
Currently, it is possible that a live migration never finishes, when the dirty
page rate is high compared to the scan/transfer rate. The exact values for
MAX_MEMORY_ITERATIONS and MAX_TOTAL_MEMORY_TRANSFER_FACTOR are arguable, but
there should be *some* limit to force the final iteration of a live migration
that does not converge.
No, there shouldn't be.
A management app can always stop a guest to force convergence. If you
make migration have unbounded downtime by default then you're making
migration unsafe for smarter consumers.
You can already set things like maximum downtime to force convergence.
If you wanted to have some logic like an exponentially increasing
maximum downtime given a fixed timeout, that would be okay provided it
was an optional feature.
So for instance, you could do something like:
downtime: defaults to 30ms
(qemu) migrate_set_convergence_timeout 60 # begin to enforce
convergence after 1 minute
At 1 minutes, downtime goes to 60ms, at 2 minutes it goes to 120ms, at 3
minutes it goes to 240ms, 4 minutes it goes to 480ms, 5 minutes it goes
to 1.06s, 6 minutes it goes to 2s, etc.
Regards,
Anthony Liguori
---
arch_init.c | 10 +++++++++-
1 files changed, 9 insertions(+), 1 deletions(-)
diff --git a/arch_init.c b/arch_init.c
index 4486925..57fcb1e 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -89,6 +89,9 @@ const uint32_t arch_type = QEMU_ARCH;
#define RAM_SAVE_FLAG_EOS 0x10
#define RAM_SAVE_FLAG_CONTINUE 0x20
+#define MAX_MEMORY_ITERATIONS 10
+#define MAX_TOTAL_MEMORY_TRANSFER_FACTOR 3
+
static int is_dup_page(uint8_t *page, uint8_t ch)
{
uint32_t val = ch<< 24 | ch<< 16 | ch<< 8 | ch;
@@ -107,6 +110,8 @@ static int is_dup_page(uint8_t *page, uint8_t ch)
static RAMBlock *last_block;
static ram_addr_t last_offset;
+static int numberFullMemoryIterations = 0;
+
static int ram_save_block(QEMUFile *f)
{
RAMBlock *block = last_block;
@@ -158,7 +163,10 @@ static int ram_save_block(QEMUFile *f)
offset = 0;
block = QLIST_NEXT(block, next);
if (!block)
+ {
+ numberFullMemoryIterations++;
block = QLIST_FIRST(&ram_list.blocks);
+ }
}
current_addr = block->offset + offset;
@@ -295,7 +303,7 @@ int ram_save_live(Monitor *mon, QEMUFile *f, int stage,
void *opaque)
expected_time = ram_save_remaining() * TARGET_PAGE_SIZE / bwidth;
- return (stage == 2)&& (expected_time<= migrate_max_downtime());
+ return (stage == 2)&& ((expected_time<= migrate_max_downtime() ||
(numberFullMemoryIterations == MAX_MEMORY_ITERATIONS) || (bytes_transferred>
(MAX_TOTAL_MEMORY_TRANSFER_FACTOR*ram_bytes_total()))));
}
static inline void *host_from_stream_offset(QEMUFile *f,