Re: [PATCH v4 19/33] migration: Add save_live_complete_precopy_thread handler

Peter Xu Wed, 05 Feb 2025 07:56:46 -0800

On Wed, Feb 05, 2025 at 12:53:21PM +0100, Maciej S. Szmigiero wrote:
> On 4.02.2025 21:34, Peter Xu wrote:
> > On Tue, Feb 04, 2025 at 08:32:15PM +0100, Maciej S. Szmigiero wrote:
> > > On 4.02.2025 18:54, Peter Xu wrote:
> > > > On Thu, Jan 30, 2025 at 11:08:40AM +0100, Maciej S. Szmigiero wrote:
> > > > > +static int multifd_device_state_save_thread(void *opaque)
> > > > > +{
> > > > > +    struct MultiFDDSSaveThreadData *data = opaque;
> > > > > +    int ret;
> > > > > +
> > > > > +    ret = data->hdlr(data->idstr, data->instance_id, 
> > > > > &send_threads_abort,
> > > > > +                     data->handler_opaque);
> > > > 
> > > > I thought we discussed somewhere and the plan was we could use Error** 
> > > > here
> > > > to report errors.  Would that still make sense, or maybe I lost some
> > > > context?
> > > 
> > > That was about *load* threads, here these are *save* threads.
> > 
> > Ah OK.
> > 
> > > 
> > > Save handlers do not return an Error value, neither save_live_iterate, nor
> > > save_live_complete_precopy or save_state does so.
> > 
> > Let's try to make new APIs work with Error* if possible.
> 
> Let's assume that these threads return an Error object.
> 
> What's qemu_savevm_state_complete_precopy_iterable() supposed to do with it?


IIUC it's not about qemu_savevm_state_complete_precopy_iterable() in this
context, as the Error* will be used in one of the thread of the pool, not
migration thread.

The goal is to be able to set Error* with migrate_set_error(), so that when
migration failed, query-migrate can return the error to libvirt, so
migration always tries to remember the 1st error hit if ever possible.

It's multifd_device_state_save_thread() to do migrate_set_error(), not in
migration thread.  qemu_savevm_state_complete_*() are indeed not ready to
pass Errors, but it's not in the discussed stack.

> Both it and its caller qemu_savevm_state_complete_precopy() only handle int
> errors.
> 
> qemu_savevm_state_complete_precopy() in turn has 4 callers, half of which (2)
> also would need to be enlightened with Error handling somehow.

Right, we don't need to touch those, as explained above.

Generally speaking, IMHO it's always good to add new code with Error*
reports, rather than retvals in migration, even if the new code is in the
migration thread stack.  It made future changes easier to switch to Error*.

> 
> > 
> > > 
> > > > Meanwhile, I still feel uneasy on having these globals 
> > > > (send_threads_abort,
> > > > send_threads_ret).  Can we make MultiFDDSSaveThreadData the only 
> > > > interface
> > > > between migration and the threads impl?  So I wonder if it can be:
> > > > 
> > > >     ret = data->hdlr(data);
> > > > 
> > > > With extended struct like this (I added thread_error and thread_quit):
> > > > 
> > > > struct MultiFDDSSaveThreadData {
> > > >       SaveLiveCompletePrecopyThreadHandler hdlr;
> > > >       char *idstr;
> > > >       uint32_t instance_id;
> > > >       void *handler_opaque;
> > > >       /*
> > > >        * Should be NULL when struct passed over to thread, the thread 
> > > > should
> > > >        * set this if the handler would return false.  It must be kept 
> > > > NULL if
> > > >        * the handler returned true / success.
> > > >        */
> > > >       Error *thread_error;
> > > 
> > > As I mentioned above, these handlers do not generally return Error type,
> > > so this would need to be an *int;
> > > 
> > > >       /*
> > > >        * Migration core would set this when it wants to notify thread to
> > > >        * quit, for example, when error occured in other threads, or 
> > > > migration is
> > > >        * cancelled by the user.
> > > >        */
> > > >       bool thread_quit;
> > > 
> > >              ^ I guess that was supposed to be a pointer too 
> > > (*thread_quit).
> > 
> > It's my intention to make this bool, to make everything managed per-thread.
> 
> But that's unnecessary since this flag is common to all these threads.

One bool would be enough, but you'll need to export another API for VFIO to
use otherwise.  I suppose that's ok too.

Some context of multifd threads and how that's done there..

We started with one "quit" per thread struct, but then we switched to one
bool exactly as you said, see commit 15f3f21d598148.

If you want to stick with one bool, it's okay too, you can export something
similar in misc.h, e.g. multifd_device_state_save_thread_quitting(), then
we can avoid passing in the "quit" either as handler parameter, or
per-thread flag.

> 
> > It's actually what we do with multifd, these are a bunch of extra threads
> > to differeciate from the "IO threads" / "multifd threads".
> > 
> > > 
> > > > };
> > > > 
> > > > Then if any multifd_device_state_save_thread() failed, for example, it
> > > > should notify all threads to quit by setting thread_quit, instead of
> > > > relying on yet another global variable to show migration needs to quit.
> > > 
> > > multifd_abort_device_state_save_threads() needs to access
> > > send_threads_abort too.
> > 
> > This may need to become something like:
> > 
> >    QLIST_FOREACH() {
> >        MultiFDDSSaveThreadData *data = ...;
> >        data->thread_quit = true;
> >    }
> 
> At the most basic level that's turning O(1) operation into O(n).
> 
> Besides, it creates a question now who now owns these MultiFDDSSaveThreadData
> structures - they could be owned by either thread pool or the
> multifd_device_state code.

I think it should be owned by migration, and with this idea it will need to
be there until waiting thread pool completing their works, so migration
core needs to free them.

> 
> Currently the ownership is simple - the multifd_device_state code
> allocates such per-thread structure in 
> multifd_spawn_device_state_save_thread()
> and immediately passes its ownership to the thread pool which
> takes care to free it once it no longer needs it.

Right, this is another reason why I think having migration owing these
structs is better.  We used to have task dangling issues when we shift
ownership of something to mainloop then we lose track of them (e.g. on TLS
handshake gsources).  Those are pretty hard to debug when hanged, because
migration core has nothing to link to the hanged tasks again anymore.

I think we should start from having migration core being able to reach
these thread-based tasks when needed.  Migration also have control of the
thread pool, then it would be easier.  Thread pool is so far simple so we
may still need to be able to reference to per-task info separately.

> 
> Now, with the list implementation if the thread pool were to free
> that MultiFDDSSaveThreadData it would also need to release it from
> the list.
> 
> Which in turn would need appropriate locking around this removal
> operation and probably also each time the list is iterated over.
> 
> On the other hand if the multifd_device_state code were to own
> that MultiFDDSSaveThreadData then it would linger around until
> multifd_device_state_send_cleanup() cleans it up even though its
> associated thread might be long gone.

Do you see a problem with it?  It sounds good to me actually.. and pretty
easy to understand.

So migration creates these MultiFDDSSaveThreadData, then create threads to
enqueue then, then wait for all threads to complete, then free these
structs.

> 
> > We may want to double check qmp 'migrate_cancel' will work when save
> > threads are running, but this can also be done for later.
> 
> > > 
> > > And multifd_join_device_state_save_threads() needs to access
> > > send_threads_ret.
> > 
> > Then this one becomes:
> > 
> >    thread_pool_wait(send_threads);
> >    QLIST_FOREACH() {
> >        MultiFDDSSaveThreadData *data = ...;
> >        if (data->thread_error) {
> >           return false;
> >        }
> >    }
> >    return true;
> 
> Same here, having a common error return would save us from having
> to iterate over a list (or having a list in the first place).

IMHO perf isn't an issue here. It's slow path, threads num is small, loop
is cheap.  I prefer prioritize cleaness in this case.

Otherwise any suggestion we could report an Error* in the threads?

> 
> > > 
> > > These variables ultimately will have to be stored somewhere since
> > > there can be multiple save threads and so multiple instances of
> > > MultiFDDSSaveThreadData.
> > > 
> > > So these need to be stored somewhere where
> > > multifd_spawn_device_state_save_thread() can reach them to assign
> > > their addresses to MultiFDDSSaveThreadData members.
> > 
> > Then multifd_spawn_device_state_save_thread() will need to manage the
> > qlist, making sure migration core remembers what jobs it submitted.  It
> > sounds good to have that bookkeeping when I think about it, instead of
> > throw the job to the thread pool and forget it..
> 
> It's not "forgetting" about the job but rather letting thread pool
> manage it - I think thread pool was introduced so these details
> (thread management) are abstracted from the migration code.
> Now they would be effectively duplicated in the migration code.

Migration is still managing those as long as you have send_threads_abort,
isn't it?  The thread pool doesn't yet have an API to say "let's quit all
the tasks", otherwise I'm OK too to use the pool API instead of having
thread_quit.

> 
> > > 
> > > However, at that point multifd_device_state_save_thread() can
> > > access them too so it does not need to have them passed via
> > > MultiFDDSSaveThreadData.
> > > 
> > > However, nothing prevents putting send_threads* variables
> > > into a global struct (with internal linkage - "static", just as
> > > these separate ones are) if you like such construct more.
> > 
> > This should be better than the current global vars indeed, but less
> > favoured if the per-thread way could work above.
> 
> You still need that list to be a global variable,
> so it's the same amount of global variables as just putting
> the existing variables in a struct (which could be even allocated
> in multifd_device_state_send_setup() and deallocated in
> multifd_device_state_send_cleanup() for extra memory savings).

Yes this works for me.

I think you got me wrong on "not allowing to introduce global variables".
I'm OK with it, but please still consider..

  - Put it under some existing global object rather than having separate
    global variables all over the places..

  - Having Error reports

And I still think we can change:

typedef int (*SaveLiveCompletePrecopyThreadHandler)(char *idstr,
                                                    uint32_t instance_id,
                                                    bool *abort_flag,
                                                    void *opaque);

To:

typedef int (*SaveLiveCompletePrecopyThreadHandler)(MultiFDDSSaveThreadData*);

No matter what.

> 
> These variables are having internal linkage limited to (relatively
> small) multifd-device-state.c, so it's not like they are polluting
> namespace in some major migration translation unit.

If someone proposes to introduce 100 global vars in multifd-device-state.c,
I'll strongly stop that.

If it's one global var, I'm OK.

What if it's 5?

===8<===
static QemuMutex queue_job_mutex;

static ThreadPool *send_threads;
static int send_threads_ret;
static bool send_threads_abort;

static MultiFDSendData *device_state_send;
===8<===

I think I should start calling a stop.  That's what happened..

Please consider introducing something like multifd_send_device_state so we
can avoid anyone in the future randomly add static global vars.

> 
> Taking into consideration having to manage an extra data structure
> (list), needing more code to do so, having worse algorithms I don't
> really see a point of using that list.
> 
> (This is orthogonal to whether the thread return type is changed to
> Error which could be easily done on the existing save threads pool
> implementation).

My bet is changing to list is as easy (10-20 LOC?).  If not, I can try to
provide the diff on top of your patch.

I'm also not strictly asking for a list, but anything that makes the API
cleaner (less globals, better error reports, etc.).

Thanks,

-- 
Peter Xu

Re: [PATCH v4 19/33] migration: Add save_live_complete_precopy_thread handler

Reply via email to