On Mon, Sep 09, 2024 at 03:08:40PM -0400, Peter Xu wrote:
> On Mon, Sep 09, 2024 at 08:32:45PM +0200, Maciej S. Szmigiero wrote:
> > On 9.09.2024 19:59, Peter Xu wrote:
> > > On Thu, Sep 05, 2024 at 04:45:48PM +0300, Avihai Horon wrote:
> > > > 
> > > > On 27/08/2024 20:54, Maciej S. Szmigiero wrote:
> > > > > External email: Use caution opening links or attachments
> > > > > 
> > > > > 
> > > > > From: "Maciej S. Szmigiero" <maciej.szmigi...@oracle.com>
> > > > > 
> > > > > These SaveVMHandlers help device provide its own asynchronous
> > > > > transmission of the remaining data at the end of a precopy phase.
> > > > > 
> > > > > In this use case the save_live_complete_precopy_begin handler might
> > > > > be used to mark the stream boundary before proceeding with 
> > > > > asynchronous
> > > > > transmission of the remaining data while the
> > > > > save_live_complete_precopy_end handler might be used to mark the
> > > > > stream boundary after performing the asynchronous transmission.
> > > > > 
> > > > > Signed-off-by: Maciej S. Szmigiero <maciej.szmigi...@oracle.com>
> > > > > ---
> > > > >    include/migration/register.h | 36 
> > > > > ++++++++++++++++++++++++++++++++++++
> > > > >    migration/savevm.c           | 35 
> > > > > +++++++++++++++++++++++++++++++++++
> > > > >    2 files changed, 71 insertions(+)
> > > > > 
> > > > > diff --git a/include/migration/register.h 
> > > > > b/include/migration/register.h
> > > > > index f60e797894e5..9de123252edf 100644
> > > > > --- a/include/migration/register.h
> > > > > +++ b/include/migration/register.h
> > > > > @@ -103,6 +103,42 @@ typedef struct SaveVMHandlers {
> > > > >         */
> > > > >        int (*save_live_complete_precopy)(QEMUFile *f, void *opaque);
> > > > > 
> > > > > +    /**
> > > > > +     * @save_live_complete_precopy_begin
> > > > > +     *
> > > > > +     * Called at the end of a precopy phase, before all
> > > > > +     * @save_live_complete_precopy handlers and before launching
> > > > > +     * all @save_live_complete_precopy_thread threads.
> > > > > +     * The handler might, for example, mark the stream boundary 
> > > > > before
> > > > > +     * proceeding with asynchronous transmission of the remaining 
> > > > > data via
> > > > > +     * @save_live_complete_precopy_thread.
> > > > > +     * When postcopy is enabled, devices that support postcopy will 
> > > > > skip this step.
> > > > > +     *
> > > > > +     * @f: QEMUFile where the handler can synchronously send data 
> > > > > before returning
> > > > > +     * @idstr: this device section idstr
> > > > > +     * @instance_id: this device section instance_id
> > > > > +     * @opaque: data pointer passed to register_savevm_live()
> > > > > +     *
> > > > > +     * Returns zero to indicate success and negative for error
> > > > > +     */
> > > > > +    int (*save_live_complete_precopy_begin)(QEMUFile *f,
> > > > > +                                            char *idstr, uint32_t 
> > > > > instance_id,
> > > > > +                                            void *opaque);
> > > > > +    /**
> > > > > +     * @save_live_complete_precopy_end
> > > > > +     *
> > > > > +     * Called at the end of a precopy phase, after 
> > > > > @save_live_complete_precopy
> > > > > +     * handlers and after all @save_live_complete_precopy_thread 
> > > > > threads have
> > > > > +     * finished. When postcopy is enabled, devices that support 
> > > > > postcopy will
> > > > > +     * skip this step.
> > > > > +     *
> > > > > +     * @f: QEMUFile where the handler can synchronously send data 
> > > > > before returning
> > > > > +     * @opaque: data pointer passed to register_savevm_live()
> > > > > +     *
> > > > > +     * Returns zero to indicate success and negative for error
> > > > > +     */
> > > > > +    int (*save_live_complete_precopy_end)(QEMUFile *f, void *opaque);
> > > > 
> > > > Is this handler necessary now that migration core is responsible for the
> > > > threads and joins them? I don't see VFIO implementing it later on.
> > > 
> > > Right, I spot the same thing.
> > > 
> > > This series added three hooks: begin, end, precopy_thread.
> > > 
> > > What I think is it only needs one, which is precopy_async.  My vague 
> > > memory
> > > was that was what we used to discuss too, so that when migration precopy
> > > flushes the final round of iterable data, it does:
> > > 
> > >    (1) loop over all complete_precopy_async() and enqueue the tasks if
> > >        existed into the migration worker pool.  Then,
> > > 
> > >    (2) loop over all complete_precopy() like before.
> > > 
> > > Optionally, we can enforce one vmstate handler only provides either
> > > complete_precopy_async() or complete_precopy().  In this case VFIO can
> > > update the two hooks during setup() by detecting multifd && !mapped_ram &&
> > > nocomp.
> > > 
> > 
> > The "_begin" hook is still necessary to mark the end of the device state
> > sent via the main migration stream (during the phase VM is still running)
> > since we can't start loading the multifd sent device state until all of
> > that earlier data finishes loading first.
> 
> Ah I remembered some more now, thanks.
> 
> If vfio can send data during iterations this new hook will also not be
> needed, right?
> 
> I remember you mentioned you'd have a look and see the challenges there, is
> there any conclusion yet on whether we can use multifd even during that?
> 
> It's also a pity that we introduce this hook only because we want a
> boundary between "iterable stage" and "final stage".  IIUC if we have any
> kind of message telling dest before hand that "we're going to the last
> stage" then this hook can be avoided.  Now it's at least inefficient
> because we need to trigger begin() per-device, even if I think it's more of
> a global request saying that "we need to load all main stream data first
> before moving on".

Or, we could add one MIG_CMD_SWITCHOVER under QEMU_VM_COMMAND, then send it
at the beginning of the switchover phase.  Then we can have a generic
marker on destination to be the boundary of "iterations" v.s. "switchover".
Then I think we can also drop the begin() here, just to avoid one such sync
per-device (also in case if others may have such need, like vdpa, then vdpa
doesn't need that flag too).

Fundamentally, that makes the VFIO_MIG_FLAG_DEV_DATA_STATE_COMPLETE to be a
migration flag..

But for sure the best is still if VFIO can enable multifd even during
iterations.  Then the boundary guard may not be needed.

> 
> > 
> > We shouldn't send that boundary marker in .save_live_complete_precopy
> > either since it would meant unnecessary waiting for other devices
> > (not necessary VFIO ones) .save_live_complete_precopy bulk data.
> > 
> > And VFIO SaveVMHandlers are shared for all VFIO devices (and const) so
> > we can't really change them at runtime.
> 
> In all cases, please consider dropping end() if it's never used; IMO it's
> fine if there is only begin(), and we shouldn't keep hooks that are never
> used.
> 
> Thanks,
> 
> -- 
> Peter Xu

-- 
Peter Xu


Reply via email to