On Tue, Feb 04, 2025 at 08:32:15PM +0100, Maciej S. Szmigiero wrote:
> On 4.02.2025 18:54, Peter Xu wrote:
> > On Thu, Jan 30, 2025 at 11:08:40AM +0100, Maciej S. Szmigiero wrote:
> > > +static int multifd_device_state_save_thread(void *opaque)
> > > +{
> > > +    struct MultiFDDSSaveThreadData *data = opaque;
> > > +    int ret;
> > > +
> > > +    ret = data->hdlr(data->idstr, data->instance_id, &send_threads_abort,
> > > +                     data->handler_opaque);
> > 
> > I thought we discussed somewhere and the plan was we could use Error** here
> > to report errors.  Would that still make sense, or maybe I lost some
> > context?
> 
> That was about *load* threads, here these are *save* threads.

Ah OK.

> 
> Save handlers do not return an Error value, neither save_live_iterate, nor
> save_live_complete_precopy or save_state does so.

Let's try to make new APIs work with Error* if possible.

> 
> > Meanwhile, I still feel uneasy on having these globals (send_threads_abort,
> > send_threads_ret).  Can we make MultiFDDSSaveThreadData the only interface
> > between migration and the threads impl?  So I wonder if it can be:
> > 
> >    ret = data->hdlr(data);
> > 
> > With extended struct like this (I added thread_error and thread_quit):
> > 
> > struct MultiFDDSSaveThreadData {
> >      SaveLiveCompletePrecopyThreadHandler hdlr;
> >      char *idstr;
> >      uint32_t instance_id;
> >      void *handler_opaque;
> >      /*
> >       * Should be NULL when struct passed over to thread, the thread should
> >       * set this if the handler would return false.  It must be kept NULL if
> >       * the handler returned true / success.
> >       */
> >      Error *thread_error;
> 
> As I mentioned above, these handlers do not generally return Error type,
> so this would need to be an *int;
> 
> >      /*
> >       * Migration core would set this when it wants to notify thread to
> >       * quit, for example, when error occured in other threads, or 
> > migration is
> >       * cancelled by the user.
> >       */
> >      bool thread_quit;
> 
>             ^ I guess that was supposed to be a pointer too (*thread_quit).

It's my intention to make this bool, to make everything managed per-thread.

It's actually what we do with multifd, these are a bunch of extra threads
to differeciate from the "IO threads" / "multifd threads".

> 
> > };
> > 
> > Then if any multifd_device_state_save_thread() failed, for example, it
> > should notify all threads to quit by setting thread_quit, instead of
> > relying on yet another global variable to show migration needs to quit.
> 
> multifd_abort_device_state_save_threads() needs to access
> send_threads_abort too.

This may need to become something like:

  QLIST_FOREACH() {
      MultiFDDSSaveThreadData *data = ...;
      data->thread_quit = true;
  }

We may want to double check qmp 'migrate_cancel' will work when save
threads are running, but this can also be done for later.

> 
> And multifd_join_device_state_save_threads() needs to access
> send_threads_ret.

Then this one becomes:

  thread_pool_wait(send_threads);
  QLIST_FOREACH() {
      MultiFDDSSaveThreadData *data = ...;
      if (data->thread_error) {
         return false;
      }
  }
  return true;

> 
> These variables ultimately will have to be stored somewhere since
> there can be multiple save threads and so multiple instances of
> MultiFDDSSaveThreadData.
> 
> So these need to be stored somewhere where
> multifd_spawn_device_state_save_thread() can reach them to assign
> their addresses to MultiFDDSSaveThreadData members.

Then multifd_spawn_device_state_save_thread() will need to manage the
qlist, making sure migration core remembers what jobs it submitted.  It
sounds good to have that bookkeeping when I think about it, instead of
throw the job to the thread pool and forget it..

> 
> However, at that point multifd_device_state_save_thread() can
> access them too so it does not need to have them passed via
> MultiFDDSSaveThreadData.
> 
> However, nothing prevents putting send_threads* variables
> into a global struct (with internal linkage - "static", just as
> these separate ones are) if you like such construct more.

This should be better than the current global vars indeed, but less
favoured if the per-thread way could work above.

Thanks,

-- 
Peter Xu


Reply via email to