Most jobs do the same thing when they leave their running loop: - Store the return code in a structure - wait to receive this structure in the main thread - signal job completion via job_completed
More seriously, when we utilize job_defer_to_main_loop_bh to call a function that calls job_completed, job_finalize_single will run in a context where it has recursively taken the aio_context lock, which can cause hangs if it puts down a reference that causes a flush. The job infrastructure is perfectly capable of registering job completion itself when we leave the job's entry point. In this context, we can signal job completion from outside of the aio_context, which should allow for job cleanup code to run with only one lock. Signed-off-by: John Snow <js...@redhat.com> --- include/qemu/job.h | 7 +++++++ job.c | 19 +++++++++++++++++++ 2 files changed, 26 insertions(+) diff --git a/include/qemu/job.h b/include/qemu/job.h index 845ad00c03..0c24e8704f 100644 --- a/include/qemu/job.h +++ b/include/qemu/job.h @@ -204,6 +204,13 @@ struct JobDriver { */ void (*drain)(Job *job); + /** + * If the callback is not NULL, exit will be invoked from the main thread + * when the job's coroutine has finished, but before transactional + * convergence; before @prepare or @abort. + */ + void (*exit)(Job *job); + /** * If the callback is not NULL, prepare will be invoked when all the jobs * belonging to the same transaction complete; or upon this job's completion diff --git a/job.c b/job.c index b281f30375..cc5ac9ac30 100644 --- a/job.c +++ b/job.c @@ -535,6 +535,19 @@ void job_drain(Job *job) } } +static void job_exit(void *opaque) +{ + Job *job = (Job *)opaque; + AioContext *aio_context = job->aio_context; + + if (job->driver->exit) { + aio_context_acquire(aio_context); + job->driver->exit(job); + aio_context_release(aio_context); + } + job_completed(job, job->ret); +} + /** * All jobs must allow a pause point before entering their job proper. This * ensures that jobs can be paused prior to being started, then resumed later. @@ -546,6 +559,12 @@ static void coroutine_fn job_co_entry(void *opaque) assert(job && job->driver && job->driver->start); job_pause_point(job); job->driver->start(job); + if (!job->deferred_to_main_loop) { + job->deferred_to_main_loop = true; + aio_bh_schedule_oneshot(qemu_get_aio_context(), + job_exit, + job); + } } -- 2.14.4