On 06/10/2013 06:44 PM, Peter Lieven wrote: > On 10.06.2013 08:55, Alexey Kardashevskiy wrote: >> On 06/10/2013 04:50 PM, Peter Lieven wrote: >>> On 10.06.2013 08:39, Alexey Kardashevskiy wrote: >>>> On 06/09/2013 05:27 PM, Peter Lieven wrote: >>>>> Am 09.06.2013 um 05:09 schrieb Alexey Kardashevskiy <a...@ozlabs.ru>: >>>>> >>>>>> On 06/09/2013 01:01 PM, Wenchao Xia wrote: >>>>>>> 于 2013-6-9 10:34, Alexey Kardashevskiy 写道: >>>>>>>> On 06/09/2013 12:16 PM, Wenchao Xia wrote: >>>>>>>>> 于 2013-6-8 16:30, Alexey Kardashevskiy 写道: >>>>>>>>>> On 06/08/2013 06:27 PM, Wenchao Xia wrote: >>>>>>>>>>>> On 04.06.2013 16:40, Paolo Bonzini wrote: >>>>>>>>>>>>> Il 04/06/2013 16:38, Peter Lieven ha scritto: >>>>>>>>>>>>>> On 04.06.2013 16:14, Paolo Bonzini wrote: >>>>>>>>>>>>>>> Il 04/06/2013 15:52, Peter Lieven ha scritto: >>>>>>>>>>>>>>>> On 30.05.2013 16:41, Paolo Bonzini wrote: >>>>>>>>>>>>>>>>> Il 30/05/2013 16:38, Peter Lieven ha scritto: >>>>>>>>>>>>>>>>>>>> You could also scan the page for nonzero >>>>>>>>>>>>>>>>>>>> values before writing it. >>>>>>>>>>>>>>>>>> i had this in mind, but then choosed the other >>>>>>>>>>>>>>>>>> approach.... turned out to be a bad idea. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> alexey: i will prepare a patch later today, >>>>>>>>>>>>>>>>>> could you then please verify it fixes your >>>>>>>>>>>>>>>>>> problem. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> paolo: would we still need the madvise or is >>>>>>>>>>>>>>>>>> it enough to not write the zeroes? >>>>>>>>>>>>>>>>> It should be enough to not write them. >>>>>>>>>>>>>>>> Problem: checking the pages for zero allocates >>>>>>>>>>>>>>>> them. even at the source. >>>>>>>>>>>>>>> It doesn't look like. I tried this program and top >>>>>>>>>>>>>>> doesn't show an increasing amount of reserved >>>>>>>>>>>>>>> memory: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> #include <stdio.h> #include <stdlib.h> int main() { >>>>>>>>>>>>>>> char *x = malloc(500 << 20); int i, j; for (i = 0; i >>>>>>>>>>>>>>> < 500; i += 10) { for (j = 0; j < 10 << 20; j += >>>>>>>>>>>>>>> 4096) { *(volatile char*) (x + (i << 20) + j); } >>>>>>>>>>>>>>> getchar(); } } >>>>>>>>>>>>>> strange. we are talking about RSS size, right? >>>>>>>>>>>>> None of the three top values change, and only VIRT is >>>>>>>>>>>>>> 500 MB. >>>>>>>>>>>>>> is the malloc above using mmapped memory? >>>>>>>>>>>>> Yes. >>>>>>>>>>>>> >>>>>>>>>>>>>> which kernel version do you use? >>>>>>>>>>>>> 3.9. >>>>>>>>>>>>> >>>>>>>>>>>>>> what avoids allocating the memory for me is the >>>>>>>>>>>>>> following (with whatever side effects it has ;-)) >>>>>>>>>>>>> This would also fail to migrate any page that is swapped >>>>>>>>>>>>> out, breaking overcommit in a more subtle way. :) >>>>>>>>>>>>> >>>>>>>>>>>>> Paolo >>>>>>>>>>>> the following does also not allocate memory, but qemu >>>>>>>>>>>> does... >>>>>>>>>>> Hi, Peter As the patch writes >>>>>>>>>>> >>>>>>>>>>> "not sending zero pages breaks migration if a page is zero >>>>>>>>>>> at the source but not at the destination." >>>>>>>>>>> >>>>>>>>>>> I don't understand why it would be trouble, shouldn't all >>>>>>>>>>> page not received in dest be treated as zero pages? >>>>>>>>>> How would the destination guest know if some page must be >>>>>>>>>> cleared? The previous patch (which Peter reverted) did not >>>>>>>>>> send anything for the pages which were zero on the source >>>>>>>>>> side. >>>>>>>>> If an page was not received and destination knows that page >>>>>>>>> should exist according to total size, fill it with zero at >>>>>>>>> destination, would it solve the problem? >>>>>>>> It is _live_ migration, the source sends changes, same pages can >>>>>>>> change and be sent several times. So we would need to turn >>>>>>>> tracking on on the destination to know if some page was received >>>>>>>> from the source or changed by the destination itself (by writing >>>>>>>> there bios/firmware images, etc) and then clear pages which were >>>>>>>> touched by the destination and were not sent by the source. >>>>>>> OK, I can understand the problem is, for example: Destination boots >>>>>>> up with 0x0000-0xFFFF filled with bios image. Source forgot to send >>>>>>> zero pages in 0x0000-0xFFFF. >>>>>> The source did not forget, instead it zeroed these pages during its >>>>>> life and thought that they must be zeroed at the destination already >>>>>> (as the destination did not start and did not have a chance to write >>>>>> something there). >>>>>> >>>>>> >>>>>>> After migration destination got 0x0000-0xFFFF dirty(different with >>>>>>> source) >>>>>> Yep. And those pages were empty on the source what made debugging very >>>>>> easy :) >>>>>> >>>>>> >>>>>>> Thanks for explain. >>>>>>> >>>>>>> This seems refer to the migration protocol: how should the guest >>>>>>> treat unsent pages. The patch causing the problem, actually treat >>>>>>> zero pages as "not to sent" at source, but another half is missing: >>>>>>> treat "not received" as zero pages at destination. I guess if second >>>>>>> half is added, problem is gone: after page transfer completed, >>>>>>> before destination resume, fill zero in "not received" pages. >>>>>> >>>>>> Make a working patch, we'll discuss it :) I do not see much >>>>>> acceleration coming from there. >>>>> I would also not spent much time with this. I would either look to find >>>>> an easy way to fix the initialization code to not unneccessarily load >>>>> data into RAM or i will sent a v2 of my patch following Eric's >>>>> concerns. >>>> There is no easy way to implement the flag and keep your original patch as >>>> we have to implement this flag in all architectures which got broken by >>>> your patch and I personally can fix only PPC64-pseries but not the others. >>>> >>>> Furthermore your revert + new patches perfectly solve the problem, why >>>> would we want to bother now with this new flag which nobody really needs >>>> right now? >>>> >>>> Please, please, revert the original patch or I'll try to do it :) >>>> >>>> >>> I tried, but there where concerns by the community. >> >> Was here anybody who did not want to revert the patch (besides you)? >> I did not notice. > Eric said I should not drop the skipped_pages stuff in the monitor. >> >> >>> Alternativly I found >>> the following alternate solution. Please drop the 2 patches and try the >>> following: >> >> How is it going to work if upstream QEMU doesn't send anything about empty >> pages at all (this is why I want to revert that patch)? > I do not understand your question. The patch below zeroes out the destination > memory if it is not zero (e.g. if there is a BIOS copied to memory already > during > machine init). > > I would prefer not to completely drop the patch since it saves bandwidth and > resources.
I would like migration to do what it should do - send pages no matter what, this is exactly what migration is for. If there any many, many empty pages (which I doubt to be a very often real life case), they could all merged in big consecutive chunks and sent at the end of migration. -- Alexey