Re: [PATCH v4 19/33] migration: Add save_live_complete_precopy_thread handler

Maciej S. Szmigiero Thu, 06 Feb 2025 03:42:49 -0800

On 5.02.2025 16:55, Peter Xu wrote:

On Wed, Feb 05, 2025 at 12:53:21PM +0100, Maciej S. Szmigiero wrote:

On 4.02.2025 21:34, Peter Xu wrote:

On Tue, Feb 04, 2025 at 08:32:15PM +0100, Maciej S. Szmigiero wrote:

On 4.02.2025 18:54, Peter Xu wrote:

On Thu, Jan 30, 2025 at 11:08:40AM +0100, Maciej S. Szmigiero wrote:

+static int multifd_device_state_save_thread(void *opaque)
+{
+    struct MultiFDDSSaveThreadData *data = opaque;
+    int ret;
+
+    ret = data->hdlr(data->idstr, data->instance_id, &send_threads_abort,
+                     data->handler_opaque);


I thought we discussed somewhere and the plan was we could use Error** here
to report errors.  Would that still make sense, or maybe I lost some
context?


That was about *load* threads, here these are *save* threads.


Ah OK.


Save handlers do not return an Error value, neither save_live_iterate, nor
save_live_complete_precopy or save_state does so.


Let's try to make new APIs work with Error* if possible.


Let's assume that these threads return an Error object.

What's qemu_savevm_state_complete_precopy_iterable() supposed to do with it?


IIUC it's not about qemu_savevm_state_complete_precopy_iterable() in this
context, as the Error* will be used in one of the thread of the pool, not
migration thread.

The goal is to be able to set Error* with migrate_set_error(), so that when
migration failed, query-migrate can return the error to libvirt, so
migration always tries to remember the 1st error hit if ever possible.

It's multifd_device_state_save_thread() to do migrate_set_error(), not in
migration thread.  qemu_savevm_state_complete_*() are indeed not ready to
pass Errors, but it's not in the discussed stack.


I understand what are you proposing now - you haven't written about using
migrate_set_error() for save threads earlier, just about returning an Error
object.

While this might work it has tendency to uncover errors in other parts of
the migration core - much as using it in the load threads case uncovered
the TLS session error.

(Speaking of which, could you please respond to the issue at the bottom of
this message from 2 days ago?:
https://lore.kernel.org/qemu-devel/150a9741-daab-4724-add0-f35257e86...@maciej.szmigiero.name/
It is blocking rework of the TLS session EOF handling in this patch set.
Thanks.)

But I can try this migrate_set_error() approach here and see if something
breaks.

(..)

Meanwhile, I still feel uneasy on having these globals (send_threads_abort,
send_threads_ret).  Can we make MultiFDDSSaveThreadData the only interface
between migration and the threads impl?  So I wonder if it can be:

     ret = data->hdlr(data);

With extended struct like this (I added thread_error and thread_quit):

struct MultiFDDSSaveThreadData {
       SaveLiveCompletePrecopyThreadHandler hdlr;
       char *idstr;
       uint32_t instance_id;
       void *handler_opaque;
       /*
        * Should be NULL when struct passed over to thread, the thread should
        * set this if the handler would return false.  It must be kept NULL if
        * the handler returned true / success.
        */
       Error *thread_error;


As I mentioned above, these handlers do not generally return Error type,
so this would need to be an *int;

       /*
        * Migration core would set this when it wants to notify thread to
        * quit, for example, when error occured in other threads, or migration 
is
        * cancelled by the user.
        */
       bool thread_quit;


              ^ I guess that was supposed to be a pointer too (*thread_quit).


It's my intention to make this bool, to make everything managed per-thread.


But that's unnecessary since this flag is common to all these threads.


One bool would be enough, but you'll need to export another API for VFIO to
use otherwise.  I suppose that's ok too.

Some context of multifd threads and how that's done there..

We started with one "quit" per thread struct, but then we switched to one
bool exactly as you said, see commit 15f3f21d598148.

If you want to stick with one bool, it's okay too, you can export something
similar in misc.h, e.g. multifd_device_state_save_thread_quitting(), then
we can avoid passing in the "quit" either as handler parameter, or
per-thread flag.


Of course I can "export" this flag via a getter function rather than passing
it as a parameter to SaveLiveCompletePrecopyThreadHandler.

It's actually what we do with multifd, these are a bunch of extra threads
to differeciate from the "IO threads" / "multifd threads".

};

Then if any multifd_device_state_save_thread() failed, for example, it
should notify all threads to quit by setting thread_quit, instead of
relying on yet another global variable to show migration needs to quit.


multifd_abort_device_state_save_threads() needs to access
send_threads_abort too.


This may need to become something like:

    QLIST_FOREACH() {
        MultiFDDSSaveThreadData *data = ...;
        data->thread_quit = true;
    }


At the most basic level that's turning O(1) operation into O(n).

Besides, it creates a question now who now owns these MultiFDDSSaveThreadData
structures - they could be owned by either thread pool or the
multifd_device_state code.


I think it should be owned by migration, and with this idea it will need to
be there until waiting thread pool completing their works, so migration
core needs to free them.


Currently the ownership is simple - the multifd_device_state code
allocates such per-thread structure in multifd_spawn_device_state_save_thread()
and immediately passes its ownership to the thread pool which
takes care to free it once it no longer needs it.


Right, this is another reason why I think having migration owing these
structs is better.  We used to have task dangling issues when we shift
ownership of something to mainloop then we lose track of them (e.g. on TLS
handshake gsources).  Those are pretty hard to debug when hanged, because
migration core has nothing to link to the hanged tasks again anymore.

I think we should start from having migration core being able to reach
these thread-based tasks when needed.  Migration also have control of the
thread pool, then it would be easier.  Thread pool is so far simple so we
may still need to be able to reference to per-task info separately.


These are separate threads, so they are are pretty easy to identify
in a debugger or a core dump.

Also, one can access them via the thread pool pointer if absolutely
necessary.

If QMP introspection ever becomes necessary then it could be simply
built into the generic thread pool itself.
Then all thread pool consumers will benefit from it.


Now, with the list implementation if the thread pool were to free
that MultiFDDSSaveThreadData it would also need to release it from
the list.

Which in turn would need appropriate locking around this removal
operation and probably also each time the list is iterated over.

On the other hand if the multifd_device_state code were to own
that MultiFDDSSaveThreadData then it would linger around until
multifd_device_state_send_cleanup() cleans it up even though its
associated thread might be long gone.


Do you see a problem with it?  It sounds good to me actually.. and pretty
easy to understand.

So migration creates these MultiFDDSSaveThreadData, then create threads to
enqueue then, then wait for all threads to complete, then free these
structs.


One of the benefits of using a thread pool is that it can abstract
memory management away by taking ownership of the data pointed to by
the passed thread opaque pointer (via the passed GDestroyNotify).

I don't see a benefit of re-implementing this also in the migration
code (returning an Error object does *not* require such approach).

We may want to double check qmp 'migrate_cancel' will work when save
threads are running, but this can also be done for later.


And multifd_join_device_state_save_threads() needs to access
send_threads_ret.


Then this one becomes:

    thread_pool_wait(send_threads);
    QLIST_FOREACH() {
        MultiFDDSSaveThreadData *data = ...;
        if (data->thread_error) {
           return false;
        }
    }
    return true;


Same here, having a common error return would save us from having
to iterate over a list (or having a list in the first place).


IMHO perf isn't an issue here. It's slow path, threads num is small, loop
is cheap.  I prefer prioritize cleaness in this case.

Otherwise any suggestion we could report an Error* in the threads?


Using Error doesn't need a list, load threads return an Error object
just fine without it:

    if (!data->function(data->opaque, &mis->load_threads_abort, &local_err)) {
        MigrationState *s = migrate_get_current();

        assert(local_err);

        /*
         * In case of multiple load threads failing which thread error
         * return we end setting is purely arbitrary.
         */
        migrate_set_error(s, local_err);
    }


Same can be done for save threads here (with the caveat of migrate_set_error()
uncovering possible other errors that I mentioned earlier).


These variables ultimately will have to be stored somewhere since
there can be multiple save threads and so multiple instances of
MultiFDDSSaveThreadData.

So these need to be stored somewhere where
multifd_spawn_device_state_save_thread() can reach them to assign
their addresses to MultiFDDSSaveThreadData members.


Then multifd_spawn_device_state_save_thread() will need to manage the
qlist, making sure migration core remembers what jobs it submitted.  It
sounds good to have that bookkeeping when I think about it, instead of
throw the job to the thread pool and forget it..


It's not "forgetting" about the job but rather letting thread pool
manage it - I think thread pool was introduced so these details
(thread management) are abstracted from the migration code.
Now they would be effectively duplicated in the migration code.


Migration is still managing those as long as you have send_threads_abort,
isn't it?  The thread pool doesn't yet have an API to say "let's quit all
the tasks", otherwise I'm OK too to use the pool API instead of having
thread_quit.


The migration code does not manage each thread separately.

It manages them as a pool, and does each operation (wait, abort)
on the pool itself (either literally via ThreadPool or by setting
a variable that's shared by all threads).


However, at that point multifd_device_state_save_thread() can
access them too so it does not need to have them passed via
MultiFDDSSaveThreadData.

However, nothing prevents putting send_threads* variables
into a global struct (with internal linkage - "static", just as
these separate ones are) if you like such construct more.


This should be better than the current global vars indeed, but less
favoured if the per-thread way could work above.


You still need that list to be a global variable,
so it's the same amount of global variables as just putting
the existing variables in a struct (which could be even allocated
in multifd_device_state_send_setup() and deallocated in
multifd_device_state_send_cleanup() for extra memory savings).


Yes this works for me.

I think you got me wrong on "not allowing to introduce global variables".
I'm OK with it, but please still consider..

   - Put it under some existing global object rather than having separate
     global variables all over the places..

   - Having Error reports

Ok.

And I still think we can change:

typedef int (*SaveLiveCompletePrecopyThreadHandler)(char *idstr,
                                                     uint32_t instance_id,
                                                     bool *abort_flag,
                                                     void *opaque);

To:

typedef int (*SaveLiveCompletePrecopyThreadHandler)(MultiFDDSSaveThreadData*);

No matter what.


We can do that, although this requires "exporting" the MultiFDDSSaveThreadData
type.


These variables are having internal linkage limited to (relatively
small) multifd-device-state.c, so it's not like they are polluting
namespace in some major migration translation unit.


If someone proposes to introduce 100 global vars in multifd-device-state.c,
I'll strongly stop that.

If it's one global var, I'm OK.

What if it's 5?

===8<===
static QemuMutex queue_job_mutex;

static ThreadPool *send_threads;
static int send_threads_ret;
static bool send_threads_abort;

static MultiFDSendData *device_state_send;
===8<===

I think I should start calling a stop.  That's what happened..

Please consider introducing something like multifd_send_device_state so we
can avoid anyone in the future randomly add static global vars.


As I wrote before, I will pack it all into one global variable,
could be called multifd_send_device_state as you suggest.


Taking into consideration having to manage an extra data structure
(list), needing more code to do so, having worse algorithms I don't
really see a point of using that list.

(This is orthogonal to whether the thread return type is changed to
Error which could be easily done on the existing save threads pool
implementation).


My bet is changing to list is as easy (10-20 LOC?).  If not, I can try to
provide the diff on top of your patch.

I'm also not strictly asking for a list, but anything that makes the API
cleaner (less globals, better error reports, etc.).


I just think introducing that list is a step back due to reasons I described
above.

And its not actually necessary for returning an Error code.

Thanks,


Thanks,
Maciej

Re: [PATCH v4 19/33] migration: Add save_live_complete_precopy_thread handler

Reply via email to