end into page req mutex

Fabiano Rosas Mon, 02 Jun 2025 10:47:11 -0700

Peter Xu <pet...@redhat.com> writes:

> The postcopy blocktime feature was tricky that it used quite some atomic
> operations over quite a few arrays and vars, without explaining how that
> would be thread safe.  The thread safety here is about concurrency between
> the fault thread and the fault resolution threads, possible to access the
> same chunk of data.  All these atomic ops can be expensive too before
> knowing clearly how it works.
>
> OTOH, postcopy has one page_request_mutex used to serialize the received
> bitmap updates.  So far it's ok - we don't yet have a lot of threads
> contending the lock.  It might change after multifd will be supported, but
> that's a separate story.  What is important is, with that mutex, it's
> pretty lightweight to move all the blocktime maintenance into the mutex
> critical section.  It's because the blocktime layer is lightweighted:
> almost "remember which vcpu faulted on which address", and "ok we get some
> fault resolved, calculate how long it takes".  It's also an optional
> feature for now (but I have thought of changing that, maybe in the future).
>
> Let's push the blocktime layer into the mutex, so that it's always
> thread-safe even without any atomic ops.
>
> To achieve that, I'll need to add a tid parameter on fault path so that
> it'll start to pass the faulted thread ID into deeper the stack, but not
> too deep.  When at it, add a comment for the shared fault handler (for
> example, vhost-user devices running with postcopy), to mention a TODO.  One
> reason it might not be trivial is that vhost-user's userfaultfds should be
> opened by vhost-user process, so it's pretty hard to control making sure
> the TID feature will be around.  It wasn't supported before, so keep it
> like that for now.
>
> Now we should be as ease when everything is protected by a mutex that we
> always take anyway.
>
> One side effect: we can finally remove one ramblock_recv_bitmap_test() in
> mark_postcopy_blocktime_begin(), which was pretty weird and which also
> includes a weird (but maybe necessary.. but maybe not?) operation to inject
> a blocktime entry then quickly erase it..  When we're with the mutex, and
> when we make sure it's invoked after checking the receive bitmap, it's not
> needed anymore.  Instead, we assert.
>
> As another side effect, this paves way for removing all atomic ops in all
> the mem accesses in blocktime layer.
>
> Note that we need a stub for mark_postcopy_blocktime_begin() for Windows
> builds.
>
> Signed-off-by: Peter Xu <pet...@redhat.com>


Reviewed-by: Fabiano Rosas <faro...@suse.de>

Re: [PATCH 02/13] migration/postcopy: Push blocktime start/end into page req mutex

Reply via email to