On Wed, Feb 05, 2025 at 12:53:21PM +0100, Maciej S. Szmigiero wrote: > On 4.02.2025 21:34, Peter Xu wrote: > > On Tue, Feb 04, 2025 at 08:32:15PM +0100, Maciej S. Szmigiero wrote: > > > On 4.02.2025 18:54, Peter Xu wrote: > > > > On Thu, Jan 30, 2025 at 11:08:40AM +0100, Maciej S. Szmigiero wrote: > > > > > +static int multifd_device_state_save_thread(void *opaque) > > > > > +{ > > > > > + struct MultiFDDSSaveThreadData *data = opaque; > > > > > + int ret; > > > > > + > > > > > + ret = data->hdlr(data->idstr, data->instance_id, > > > > > &send_threads_abort, > > > > > + data->handler_opaque); > > > > > > > > I thought we discussed somewhere and the plan was we could use Error** > > > > here > > > > to report errors. Would that still make sense, or maybe I lost some > > > > context? > > > > > > That was about *load* threads, here these are *save* threads. > > > > Ah OK. > > > > > > > > Save handlers do not return an Error value, neither save_live_iterate, nor > > > save_live_complete_precopy or save_state does so. > > > > Let's try to make new APIs work with Error* if possible. > > Let's assume that these threads return an Error object. > > What's qemu_savevm_state_complete_precopy_iterable() supposed to do with it?
IIUC it's not about qemu_savevm_state_complete_precopy_iterable() in this context, as the Error* will be used in one of the thread of the pool, not migration thread. The goal is to be able to set Error* with migrate_set_error(), so that when migration failed, query-migrate can return the error to libvirt, so migration always tries to remember the 1st error hit if ever possible. It's multifd_device_state_save_thread() to do migrate_set_error(), not in migration thread. qemu_savevm_state_complete_*() are indeed not ready to pass Errors, but it's not in the discussed stack. > Both it and its caller qemu_savevm_state_complete_precopy() only handle int > errors. > > qemu_savevm_state_complete_precopy() in turn has 4 callers, half of which (2) > also would need to be enlightened with Error handling somehow. Right, we don't need to touch those, as explained above. Generally speaking, IMHO it's always good to add new code with Error* reports, rather than retvals in migration, even if the new code is in the migration thread stack. It made future changes easier to switch to Error*. > > > > > > > > > > Meanwhile, I still feel uneasy on having these globals > > > > (send_threads_abort, > > > > send_threads_ret). Can we make MultiFDDSSaveThreadData the only > > > > interface > > > > between migration and the threads impl? So I wonder if it can be: > > > > > > > > ret = data->hdlr(data); > > > > > > > > With extended struct like this (I added thread_error and thread_quit): > > > > > > > > struct MultiFDDSSaveThreadData { > > > > SaveLiveCompletePrecopyThreadHandler hdlr; > > > > char *idstr; > > > > uint32_t instance_id; > > > > void *handler_opaque; > > > > /* > > > > * Should be NULL when struct passed over to thread, the thread > > > > should > > > > * set this if the handler would return false. It must be kept > > > > NULL if > > > > * the handler returned true / success. > > > > */ > > > > Error *thread_error; > > > > > > As I mentioned above, these handlers do not generally return Error type, > > > so this would need to be an *int; > > > > > > > /* > > > > * Migration core would set this when it wants to notify thread to > > > > * quit, for example, when error occured in other threads, or > > > > migration is > > > > * cancelled by the user. > > > > */ > > > > bool thread_quit; > > > > > > ^ I guess that was supposed to be a pointer too > > > (*thread_quit). > > > > It's my intention to make this bool, to make everything managed per-thread. > > But that's unnecessary since this flag is common to all these threads. One bool would be enough, but you'll need to export another API for VFIO to use otherwise. I suppose that's ok too. Some context of multifd threads and how that's done there.. We started with one "quit" per thread struct, but then we switched to one bool exactly as you said, see commit 15f3f21d598148. If you want to stick with one bool, it's okay too, you can export something similar in misc.h, e.g. multifd_device_state_save_thread_quitting(), then we can avoid passing in the "quit" either as handler parameter, or per-thread flag. > > > It's actually what we do with multifd, these are a bunch of extra threads > > to differeciate from the "IO threads" / "multifd threads". > > > > > > > > > }; > > > > > > > > Then if any multifd_device_state_save_thread() failed, for example, it > > > > should notify all threads to quit by setting thread_quit, instead of > > > > relying on yet another global variable to show migration needs to quit. > > > > > > multifd_abort_device_state_save_threads() needs to access > > > send_threads_abort too. > > > > This may need to become something like: > > > > QLIST_FOREACH() { > > MultiFDDSSaveThreadData *data = ...; > > data->thread_quit = true; > > } > > At the most basic level that's turning O(1) operation into O(n). > > Besides, it creates a question now who now owns these MultiFDDSSaveThreadData > structures - they could be owned by either thread pool or the > multifd_device_state code. I think it should be owned by migration, and with this idea it will need to be there until waiting thread pool completing their works, so migration core needs to free them. > > Currently the ownership is simple - the multifd_device_state code > allocates such per-thread structure in > multifd_spawn_device_state_save_thread() > and immediately passes its ownership to the thread pool which > takes care to free it once it no longer needs it. Right, this is another reason why I think having migration owing these structs is better. We used to have task dangling issues when we shift ownership of something to mainloop then we lose track of them (e.g. on TLS handshake gsources). Those are pretty hard to debug when hanged, because migration core has nothing to link to the hanged tasks again anymore. I think we should start from having migration core being able to reach these thread-based tasks when needed. Migration also have control of the thread pool, then it would be easier. Thread pool is so far simple so we may still need to be able to reference to per-task info separately. > > Now, with the list implementation if the thread pool were to free > that MultiFDDSSaveThreadData it would also need to release it from > the list. > > Which in turn would need appropriate locking around this removal > operation and probably also each time the list is iterated over. > > On the other hand if the multifd_device_state code were to own > that MultiFDDSSaveThreadData then it would linger around until > multifd_device_state_send_cleanup() cleans it up even though its > associated thread might be long gone. Do you see a problem with it? It sounds good to me actually.. and pretty easy to understand. So migration creates these MultiFDDSSaveThreadData, then create threads to enqueue then, then wait for all threads to complete, then free these structs. > > > We may want to double check qmp 'migrate_cancel' will work when save > > threads are running, but this can also be done for later. > > > > > > > And multifd_join_device_state_save_threads() needs to access > > > send_threads_ret. > > > > Then this one becomes: > > > > thread_pool_wait(send_threads); > > QLIST_FOREACH() { > > MultiFDDSSaveThreadData *data = ...; > > if (data->thread_error) { > > return false; > > } > > } > > return true; > > Same here, having a common error return would save us from having > to iterate over a list (or having a list in the first place). IMHO perf isn't an issue here. It's slow path, threads num is small, loop is cheap. I prefer prioritize cleaness in this case. Otherwise any suggestion we could report an Error* in the threads? > > > > > > > These variables ultimately will have to be stored somewhere since > > > there can be multiple save threads and so multiple instances of > > > MultiFDDSSaveThreadData. > > > > > > So these need to be stored somewhere where > > > multifd_spawn_device_state_save_thread() can reach them to assign > > > their addresses to MultiFDDSSaveThreadData members. > > > > Then multifd_spawn_device_state_save_thread() will need to manage the > > qlist, making sure migration core remembers what jobs it submitted. It > > sounds good to have that bookkeeping when I think about it, instead of > > throw the job to the thread pool and forget it.. > > It's not "forgetting" about the job but rather letting thread pool > manage it - I think thread pool was introduced so these details > (thread management) are abstracted from the migration code. > Now they would be effectively duplicated in the migration code. Migration is still managing those as long as you have send_threads_abort, isn't it? The thread pool doesn't yet have an API to say "let's quit all the tasks", otherwise I'm OK too to use the pool API instead of having thread_quit. > > > > > > > However, at that point multifd_device_state_save_thread() can > > > access them too so it does not need to have them passed via > > > MultiFDDSSaveThreadData. > > > > > > However, nothing prevents putting send_threads* variables > > > into a global struct (with internal linkage - "static", just as > > > these separate ones are) if you like such construct more. > > > > This should be better than the current global vars indeed, but less > > favoured if the per-thread way could work above. > > You still need that list to be a global variable, > so it's the same amount of global variables as just putting > the existing variables in a struct (which could be even allocated > in multifd_device_state_send_setup() and deallocated in > multifd_device_state_send_cleanup() for extra memory savings). Yes this works for me. I think you got me wrong on "not allowing to introduce global variables". I'm OK with it, but please still consider.. - Put it under some existing global object rather than having separate global variables all over the places.. - Having Error reports And I still think we can change: typedef int (*SaveLiveCompletePrecopyThreadHandler)(char *idstr, uint32_t instance_id, bool *abort_flag, void *opaque); To: typedef int (*SaveLiveCompletePrecopyThreadHandler)(MultiFDDSSaveThreadData*); No matter what. > > These variables are having internal linkage limited to (relatively > small) multifd-device-state.c, so it's not like they are polluting > namespace in some major migration translation unit. If someone proposes to introduce 100 global vars in multifd-device-state.c, I'll strongly stop that. If it's one global var, I'm OK. What if it's 5? ===8<=== static QemuMutex queue_job_mutex; static ThreadPool *send_threads; static int send_threads_ret; static bool send_threads_abort; static MultiFDSendData *device_state_send; ===8<=== I think I should start calling a stop. That's what happened.. Please consider introducing something like multifd_send_device_state so we can avoid anyone in the future randomly add static global vars. > > Taking into consideration having to manage an extra data structure > (list), needing more code to do so, having worse algorithms I don't > really see a point of using that list. > > (This is orthogonal to whether the thread return type is changed to > Error which could be easily done on the existing save threads pool > implementation). My bet is changing to list is as easy (10-20 LOC?). If not, I can try to provide the diff on top of your patch. I'm also not strictly asking for a list, but anything that makes the API cleaner (less globals, better error reports, etc.). Thanks, -- Peter Xu