On Tue, Aug 02, 2011 at 03:45:56PM +0200, Shribman, Aidan wrote: > Subject: [PATCH v3] XBZRLE delta for live migration of large memory apps > From: Aidan Shribman <aidan.shrib...@sap.com> > > By using XBZRLE (Xor Binary Zero Run-Length-Encoding) we can reduce VM > downtime > and total live-migration time for VMs running memory write intensive workloads > typical of large enterprise applications such as SAP ERP Systems, and > generally > speaking for representative of any application with a sparse memory update > pattern. > > On the sender side XBZRLE is used as a compact delta encoding of page updates, > retrieving the old page content from an LRU cache (default size of 64 MB). The > receiving side uses the existing page content and XBZRLE to decode the new > page > content. > > Work was originally based on research results published VEE 2011: Evaluation > of > Delta Compression Techniques for Efficient Live Migration of Large Virtual > Machines by Benoit, Svard, Tordsson and Elmroth. Additionally the delta > encoder > XBRLE was improved further using XBZRLE instead. > > XBZRLE has a sustained bandwidth of 1.5-2.2 GB/s for typical workloads making > it > ideal for in-line, real-time encoding such as is needed for live-migration.
What is the CPU cost of xbzrle live migration on the source host? I'm thinking about a graph showing CPU utilization (e.g. from mpstat(1)) that has two datasets: migration without xbzrle and migration with xbzrle. > @@ -128,28 +288,35 @@ static int ram_save_block(QEMUFile *f) > current_addr + TARGET_PAGE_SIZE, > MIGRATION_DIRTY_FLAG); > > - p = block->host + offset; > + if (arch_mig_state.use_xbrle) { > + p = qemu_mallocz(TARGET_PAGE_SIZE); qemu_malloc() > +static uint8_t count_hash_bits(uint64_t v) > +{ > + uint8_t bits = 0; > + > + while (!(v & 1)) { > + v = v >> 1; > + bits++; > + } > + return bits; > +} See ffs(3). ffsll() does what you need. > +static uint8_t xor_buf[TARGET_PAGE_SIZE]; > +static uint8_t xbzrle_buf[TARGET_PAGE_SIZE * 2]; Do these need to be static globals? It should be fine to define them as local variables inside the functions that need them, there is enough stack space. > + > +int xbzrle_encode(uint8_t *xbzrle, const uint8_t *old, const uint8_t *curr, > + const size_t max_compressed_len) > +{ > + int compressed_len; > + > + xor_encode_word(xor_buf, old, curr); > + compressed_len = rle_encode((uint64_t *)xor_buf, > + sizeof(xor_buf)/sizeof(uint64_t), xbzrle_buf, > + sizeof(xbzrle_buf)); > + if (compressed_len > max_compressed_len) { > + return -1; > + } > + memcpy(xbzrle, xbzrle_buf, compressed_len); Why the intermediate xbrzle_buf buffer and why the memcpy()? return rle_encode((uint64_t *)xor_buf, sizeof(xor_buf) / sizeof(uint64_t), xbzrle, max_compressed_len); Stefan