On 06/10/2013 04:50 PM, Peter Lieven wrote: > On 10.06.2013 08:39, Alexey Kardashevskiy wrote: >> On 06/09/2013 05:27 PM, Peter Lieven wrote: >>> Am 09.06.2013 um 05:09 schrieb Alexey Kardashevskiy <a...@ozlabs.ru>: >>> >>>> On 06/09/2013 01:01 PM, Wenchao Xia wrote: >>>>> 于 2013-6-9 10:34, Alexey Kardashevskiy 写道: >>>>>> On 06/09/2013 12:16 PM, Wenchao Xia wrote: >>>>>>> 于 2013-6-8 16:30, Alexey Kardashevskiy 写道: >>>>>>>> On 06/08/2013 06:27 PM, Wenchao Xia wrote: >>>>>>>>>> On 04.06.2013 16:40, Paolo Bonzini wrote: >>>>>>>>>>> Il 04/06/2013 16:38, Peter Lieven ha scritto: >>>>>>>>>>>> On 04.06.2013 16:14, Paolo Bonzini wrote: >>>>>>>>>>>>> Il 04/06/2013 15:52, Peter Lieven ha scritto: >>>>>>>>>>>>>> On 30.05.2013 16:41, Paolo Bonzini wrote: >>>>>>>>>>>>>>> Il 30/05/2013 16:38, Peter Lieven ha scritto: >>>>>>>>>>>>>>>>>> You could also scan the page for nonzero >>>>>>>>>>>>>>>>>> values before writing it. >>>>>>>>>>>>>>>> i had this in mind, but then choosed the other >>>>>>>>>>>>>>>> approach.... turned out to be a bad idea. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> alexey: i will prepare a patch later today, >>>>>>>>>>>>>>>> could you then please verify it fixes your >>>>>>>>>>>>>>>> problem. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> paolo: would we still need the madvise or is >>>>>>>>>>>>>>>> it enough to not write the zeroes? >>>>>>>>>>>>>>> It should be enough to not write them. >>>>>>>>>>>>>> Problem: checking the pages for zero allocates >>>>>>>>>>>>>> them. even at the source. >>>>>>>>>>>>> It doesn't look like. I tried this program and top >>>>>>>>>>>>> doesn't show an increasing amount of reserved >>>>>>>>>>>>> memory: >>>>>>>>>>>>> >>>>>>>>>>>>> #include <stdio.h> #include <stdlib.h> int main() { >>>>>>>>>>>>> char *x = malloc(500 << 20); int i, j; for (i = 0; i >>>>>>>>>>>>> < 500; i += 10) { for (j = 0; j < 10 << 20; j += >>>>>>>>>>>>> 4096) { *(volatile char*) (x + (i << 20) + j); } >>>>>>>>>>>>> getchar(); } } >>>>>>>>>>>> strange. we are talking about RSS size, right? >>>>>>>>>>> None of the three top values change, and only VIRT is >>>>>>>>>>>> 500 MB. >>>>>>>>>>>> is the malloc above using mmapped memory? >>>>>>>>>>> Yes. >>>>>>>>>>> >>>>>>>>>>>> which kernel version do you use? >>>>>>>>>>> 3.9. >>>>>>>>>>> >>>>>>>>>>>> what avoids allocating the memory for me is the >>>>>>>>>>>> following (with whatever side effects it has ;-)) >>>>>>>>>>> This would also fail to migrate any page that is swapped >>>>>>>>>>> out, breaking overcommit in a more subtle way. :) >>>>>>>>>>> >>>>>>>>>>> Paolo >>>>>>>>>> the following does also not allocate memory, but qemu >>>>>>>>>> does... >>>>>>>>> Hi, Peter As the patch writes >>>>>>>>> >>>>>>>>> "not sending zero pages breaks migration if a page is zero >>>>>>>>> at the source but not at the destination." >>>>>>>>> >>>>>>>>> I don't understand why it would be trouble, shouldn't all >>>>>>>>> page not received in dest be treated as zero pages? >>>>>>>> >>>>>>>> How would the destination guest know if some page must be >>>>>>>> cleared? The previous patch (which Peter reverted) did not >>>>>>>> send anything for the pages which were zero on the source >>>>>>>> side. >>>>>>> If an page was not received and destination knows that page >>>>>>> should exist according to total size, fill it with zero at >>>>>>> destination, would it solve the problem? >>>>>> It is _live_ migration, the source sends changes, same pages can >>>>>> change and be sent several times. So we would need to turn >>>>>> tracking on on the destination to know if some page was received >>>>>> from the source or changed by the destination itself (by writing >>>>>> there bios/firmware images, etc) and then clear pages which were >>>>>> touched by the destination and were not sent by the source. >>>>> OK, I can understand the problem is, for example: Destination boots >>>>> up with 0x0000-0xFFFF filled with bios image. Source forgot to send >>>>> zero pages in 0x0000-0xFFFF. >>>> >>>> The source did not forget, instead it zeroed these pages during its >>>> life and thought that they must be zeroed at the destination already >>>> (as the destination did not start and did not have a chance to write >>>> something there). >>>> >>>> >>>>> After migration destination got 0x0000-0xFFFF dirty(different with >>>>> source) >>>> Yep. And those pages were empty on the source what made debugging very >>>> easy :) >>>> >>>> >>>>> Thanks for explain. >>>>> >>>>> This seems refer to the migration protocol: how should the guest >>>>> treat unsent pages. The patch causing the problem, actually treat >>>>> zero pages as "not to sent" at source, but another half is missing: >>>>> treat "not received" as zero pages at destination. I guess if second >>>>> half is added, problem is gone: after page transfer completed, >>>>> before destination resume, fill zero in "not received" pages. >>>> >>>> >>>> Make a working patch, we'll discuss it :) I do not see much >>>> acceleration coming from there. >>> I would also not spent much time with this. I would either look to find >>> an easy way to fix the initialization code to not unneccessarily load >>> data into RAM or i will sent a v2 of my patch following Eric's >>> concerns. >> There is no easy way to implement the flag and keep your original patch as >> we have to implement this flag in all architectures which got broken by >> your patch and I personally can fix only PPC64-pseries but not the others. >> >> Furthermore your revert + new patches perfectly solve the problem, why >> would we want to bother now with this new flag which nobody really needs >> right now? >> >> Please, please, revert the original patch or I'll try to do it :) >> >> > I tried, but there where concerns by the community.
Was here anybody who did not want to revert the patch (besides you)? I did not notice. > Alternativly I found > the following alternate solution. Please drop the 2 patches and try the > following: How is it going to work if upstream QEMU doesn't send anything about empty pages at all (this is why I want to revert that patch)? > > diff --git a/arch_init.c b/arch_init.c > index 5d32ecf..458bf8c 100644 > --- a/arch_init.c > +++ b/arch_init.c > @@ -799,6 +799,8 @@ static int ram_load(QEMUFile *f, void *opaque, int > version_id) > while (total_ram_bytes) { > RAMBlock *block; > uint8_t len; > + void *base; > + ram_addr_t offset; > > len = qemu_get_byte(f); > qemu_get_buffer(f, (uint8_t *)id, len); > @@ -822,6 +824,14 @@ static int ram_load(QEMUFile *f, void *opaque, int > version_id) > goto done; > } > > + base = memory_region_get_ram_ptr(block->mr); > + for (offset = 0; offset < block->length; > + offset += TARGET_PAGE_SIZE) { > + if (!is_zero_page(base + offset)) { > + memset(base + offset, 0x00, TARGET_PAGE_SIZE); > + } > + } > + > total_ram_bytes -= length; > } > } > > This is done at setup time so there is no additional cost for zero checking > at each compressed page > coming in. > > Peter -- Alexey