On Thu, Apr 17, 2025 at 01:05:37PM -0300, Fabiano Rosas wrote:
> It's not that page faults happen during multifd. The page was already
> sent during precopy, but multifd-recv didn't write to it, it just marked
> the receivedmap. When postcopy starts, the page gets accessed and
> faults. Since postcopy is on, the migration wants to request the page
> from the source, but it's present in the receivedmap, so it doesn't
> ask. No page ever comes and the code hangs waiting for the page fault to
> be serviced (or potentially faults continuously? I'm not sure on the
> details).

I think your previous analysis is correct on the zero pages.  I am not 100%
sure if that's the issue but very likely.  I tend to also agree with you
that we could skip zero page optimization in multifd code when postcopy is
enabled (maybe plus some comment right above..).

Thanks,

-- 
Peter Xu


Reply via email to