On 04/26/2017 09:52 PM, Christian Borntraeger wrote: > On 04/26/2017 09:04 PM, Dr. David Alan Gilbert wrote: >> * Christian Borntraeger (borntrae...@de.ibm.com) wrote: >>> On 04/26/2017 08:37 PM, Dr. David Alan Gilbert (git) wrote: >>>> From: "Dr. David Alan Gilbert" <dgilb...@redhat.com> >>>> >>>> When an all-zero page is received during the precopy >>>> phase of a postcopy-enabled migration we must force >>>> allocation otherwise accesses to the page will still >>>> get blocked by userfault. >>>> >>>> Symptom: >>>> a) If the page is accessed by a device during device-load >>>> then we get a deadlock as the source finishes sending >>>> all its pages but the destination device-load is still >>>> paused and so doesn't clean up. >>>> >>>> b) If the page is accessed later, then the thread will stay >>>> paused until the end of migration rather than carrying on >>>> running, until we release userfault at the end. >>>> >>>> Signed-off-by: Dr. David Alan Gilbert <dgilb...@redhat.com> >>>> Reported-by: Christian Borntraeger <borntrae...@de.ibm.com> >>> >>> CC stable? after all the guest hangs on both sides >>> >>> Has survived 40 migrations (usually failed at the 2nd) >>> Tested-by: Christian Borntraeger <borntrae...@de.ibm.com> >> >> Great...but..... >> Andrea (added to the mail) says this shouldn't be necessary. >> The read we were doing in the is_zero_range() should have been sufficient >> to get the page mapped and that zero page should have survived. > > We do not do is_zero_range if ch==0 because of lazy evaluation, no?
Sorry misread that code. We do.