> From: Stefan Hajnoczi [mailto:stefa...@gmail.com] > Sent: Wednesday, June 22, 2011 3:26 PM > > On Wed, Jun 22, 2011 at 1:01 PM, Anthony Liguori > <anth...@codemonkey.ws> wrote: > >> > >> By using XBRLE (Xor Based Run-Length-Encoding) we can > reduce required > >> bandwidth for transfering of dirty memory pages during > live migration > >> migrate_set_cachesize<size> > >> migrate -x<url> > > > > By how much?
See "Evaluation of delta compression techniques for efficient live migration of large virtual machines" (http://portal.acm.org/citation.cfm?id=1952698) subsection 5.2.3: In the final test, a VM running a SAP Central Instance ERP system was migrated over Gigabit Ethernet. With XBRLE, the total migration time was reduced from 235 s to 139 s, the suspension time was reduced from 3 s to 0.2 s, and the ping downtime from 5 s to 1 s... > > > > This is a change to the live migration protocol, it would > also require > > documentation and an understanding of how it affects compatibility. The default behavior (i.e. using the -x migrate option) has not changed thus still compatible with previous qemu versions. When initiating a migration with XBRLE the remote peer must also support XBRLE else migration will fail. With regards to documentation please advise. > > > > The patch really needs to be split into logical pieces too. > It's a bit too > > big for a meaningful review. > > Two places where you could consider splitting the patch is the caching > and the sampling. Are they necessary for correctness and could they > be submitted as follow-up patches to a core patch which does just the > XBRLE? Some changes were done to reduce code size: (1) Sampling code - which was used for early detection that the page changed so much thus XBRLE not applicable - has been replaced with a simple check that the XBRLE delta does not overflow a 1/3 of a page size (4096 bytes). (2) Check-summing/page-logging code - existing only under debug compile ifdef option - was removed. (3) XBRLE migration statistics - were replaced with a detailed 'info migrate' output - vital for tracking the XBRLE operation. (4) 2-way associative cache - was not separated from the XBRLE code as the cache is a fundamental part of XBRLE implementation (XBRLE updates are the difference between the new page and the old cached page on the sender side). Currently I don't see howto split the code into smaller meaningful pieces - for now I have re-submitted a single patch with correction. (see separate email [PATCH v2]). > > Also, whenever there are heuristics and use of floating point then > there is some magic going on. It may be necessary and give a huge > performance boost but needs explanation so it is not a black box or > fragile mechanism once it has been merged upstream. The code mentioned (which was responsible for sampling page to test it for being eligible for XBRLE encoding) has been removed. > > Stefan Aidan