On 04/27/2017 05:20 AM, Peter Xu wrote: > On Wed, Apr 26, 2017 at 09:37:43PM +0200, Andrea Arcangeli wrote: >> Hello, >> >> On Wed, Apr 26, 2017 at 08:04:43PM +0100, Dr. David Alan Gilbert wrote: >>> * Christian Borntraeger (borntrae...@de.ibm.com) wrote: >>>> On 04/26/2017 08:37 PM, Dr. David Alan Gilbert (git) wrote: >>>>> From: "Dr. David Alan Gilbert" <dgilb...@redhat.com> >>>>> >>>>> When an all-zero page is received during the precopy >>>>> phase of a postcopy-enabled migration we must force >>>>> allocation otherwise accesses to the page will still >>>>> get blocked by userfault. >>>>> >>>>> Symptom: >>>>> a) If the page is accessed by a device during device-load >>>>> then we get a deadlock as the source finishes sending >>>>> all its pages but the destination device-load is still >>>>> paused and so doesn't clean up. >>>>> >>>>> b) If the page is accessed later, then the thread will stay >>>>> paused until the end of migration rather than carrying on >>>>> running, until we release userfault at the end. >>>>> >>>>> Signed-off-by: Dr. David Alan Gilbert <dgilb...@redhat.com> >>>>> Reported-by: Christian Borntraeger <borntrae...@de.ibm.com> >>>> >>>> CC stable? after all the guest hangs on both sides >>>> >>>> Has survived 40 migrations (usually failed at the 2nd) >>>> Tested-by: Christian Borntraeger <borntrae...@de.ibm.com> >>> >>> Great...but..... >>> Andrea (added to the mail) says this shouldn't be necessary. >>> The read we were doing in the is_zero_range() should have been sufficient >>> to get the page mapped and that zero page should have survived. >>> >>> So - I guess that's back a step, we need to figure out why the >>> page disapepars for you. >> >> Yes reading during precopy is enough to fill the hole and prevent >> userfault missing faults to trigger. >> >> Somehow the pagetable must be mapped by a zeropage or a hugezeropage >> or a regular page allocated during a previous precopy pass or a >> pre-zeroed subpage part of a THP. >> >> Even if the hugezeropage is splitted later by a MADV_DONTNEED with >> postcopy starts, they will become 4k zeropages. >> >> After a read succeeds, nothing (except MADV_DONTNEED or other explicit >> syscalls which qemu would need to invoke explicitly between >> is_zero_range and UFFDIO_REGISTER) should be able to bring the >> pagetable back to its "pte_none/pmd_none" state that will then trigger >> missing userfaults during postcopy later.
I have started instrumenting the kernel. I can see a set_pte_at for this address and I see an (to be understood) invalidation shortly after that which explains why I get a fault. > > No matter what finally the solution would be (after see Juan's > comment, I am curious about whether is_zero_page() behaves differently > in power now)... Dave, would it worth mentioning in s390 is not power. :-) And we fall back to the normal buffer_zero_int which reads. > ram_handle_compressed() about this read side-effect? Otherwise imho it > might be hard for many people to quickly notice this.