On Tue, Aug 02, 2011 at 04:01:06PM +0200, Alexander Graf wrote: > > On 02.08.2011, at 15:45, Shribman, Aidan wrote: > > > Subject: [PATCH v3] XBZRLE delta for live migration of large memory apps > > From: Aidan Shribman <aidan.shrib...@sap.com> > > > > By using XBZRLE (Xor Binary Zero Run-Length-Encoding) we can reduce VM > > downtime > > and total live-migration time for VMs running memory write intensive > > workloads > > typical of large enterprise applications such as SAP ERP Systems, and > > generally > > speaking for representative of any application with a sparse memory update > > pattern. > > > > On the sender side XBZRLE is used as a compact delta encoding of page > > updates, > > retrieving the old page content from an LRU cache (default size of 64 MB). > > The > > receiving side uses the existing page content and XBZRLE to decode the new > > page > > content. > > > > Work was originally based on research results published VEE 2011: > > Evaluation of > > Delta Compression Techniques for Efficient Live Migration of Large Virtual > > Machines by Benoit, Svard, Tordsson and Elmroth. Additionally the delta > > encoder > > XBRLE was improved further using XBZRLE instead. > > > > XBZRLE has a sustained bandwidth of 1.5-2.2 GB/s for typical workloads > > making it > > ideal for in-line, real-time encoding such as is needed for live-migration. > > > > A typical usage scenario: > > {qemu} migrate_set_cachesize 256m > > {qemu} migrate -x -d tcp:destination.host:4444 > > {qemu} info migrate > > ... > > transferred ram-duplicate: A kbytes > > transferred ram-duplicate: B pages > > transferred ram-normal: C kbytes > > transferred ram-normal: D pages > > transferred ram-xbrle: E kbytes > > transferred ram-xbrle: F pages > > overflow ram-xbrle: G pages > > cache-hit ram-xbrle: H pages > > cache-lookup ram-xbrle: J pages > > > > Testing: live migration with XBZRLE completed in 110 seconds, without live > > migration was not able to complete. > > > > A simple synthetic memory r/w load generator: > > .. include <stdlib.h> > > .. include <stdio.h> > > .. int main() > > .. { > > .. char *buf = (char *) calloc(4096, 4096); > > .. while (1) { > > .. int i; > > .. for (i = 0; i < 4096 * 4; i++) { > > .. buf[i * 4096 / 4]++; > > .. } > > .. printf("."); > > .. } > > .. } > > > > Signed-off-by: Benoit Hudzia <benoit.hud...@sap.com> > > Signed-off-by: Petter Svard <pett...@cs.umu.se> > > Signed-off-by: Aidan Shribman <aidan.shrib...@sap.com> > > > So if I understand correctly, this enabled delta updates for dirty pages? > Would it be possible to do the same on the block layer, so that VM backing > file data could potentially save the new information as delta over the old > block? Especially with metadata updates, that could save quite some disk > space. > > Of course that would mean that a block is no longer the size of a block :). > Maybe something to consider for qcow3?
This is a good idea for a transport format but I think it would noticably degrade the I/O performance of a running VM. Some file systems also provide compression but it is rarely used. The use-case is basically "Write Once Read Many" archiving. In other scenarios I don't think this will work well. I/O request size is restricted to multiples of the host device blocksize (e.g. 512 bytes or 4 KB). Because of this it isn't trivial to pack sub-blocksized data. Since disk I/O is slow the image format either needs to be simple or use a significantly superior data structure that makes up for the additional metadata. VMDK has a "stream optimized" metadata format and QCOW2 supports compression but I don't think they do delta compression. Also there may be limitations on how compact the image file stays when you rewrite data. Stefan