Am 05.04.2013 um 22:00 schrieb Paolo Bonzini <pbonz...@redhat.com>: > Il 05/04/2013 21:23, Kevin Wolf ha scritto: >>>> virtually all dup pages are zero pages. remove >>>> the special is_dup_page() function and use the >>>> optimized buffer_find_nonzero_offset() function >>>> instead. >>>> >>>> here buffer_find_nonzero_offset() is used directly >>>> to avoid the unnecssary additional checks in >>>> buffer_is_zero(). >>>> >>>> raw performace gain checking 1 GByte zeroed memory >>>> over is_dup_page() is approx. 10-12% with SSE2 >>>> and 8-10% with unsigned long arithmedtic. >>>> >>>> Signed-off-by: Peter Lieven <p...@kamp.de> >>>> Reviewed-by: Orit Wasserman <owass...@redhat.com> >>>> Reviewed-by: Eric Blake <ebl...@redhat.com> >> Okay, so I bisected again and this is the second patch that is involved >> in the slowness of qemu-iotests case 007. >> >> The problem seems to be that the RAM of a guest is in fact _not_ zeroed >> during initialisation. It hits my test case reliably because I'm running >> with MALLOC_PERTURB_. Now I'm wondering if in practice this happens only >> under such test conditions, or if real guests could be affected as well >> and we should make sure to get zeroed memory for RAM. > > I think we should MADV_DONTNEED it.
This does not guarantee that the memory is unmapped afaik. Sadly, I think we have to revert migration: do not sent zero pages in bulk stage The memory assigned by posix_memalign is most likely zero as all GFP_USER pages are zeroed out by the kernel on alloc (at least under Linux), but if the page is reused in the same process it is not necessarily zero anymore. What I was trying to achieve with this patch is that the memset when receiving a zero_page at the target was allocating memory and the MADV_DONTNEED was not immediately dropping the page. This lead to memory pressure and swapping etc. on overcommitted systems. What I would propose as a solution for this is after reverting the "do not sent zero pages" patch is sth like this when receiving a compressed page: if (ch != 0 || !is_zero_page(host)) { memset(host, ch, TARGET_PAGE_SIZE); } Regarding Kevins observation of the speed regression in iotest 007 this is simply if MALLOC_PERTURB_ is used there are simply no zero pages, but only dup pages in memory. On a real system the observation is that pages are either zero or not dup at all. Peter > > Paolo