From: Michael Qiu <qiud...@huayun.com> v4: rebase to latest code
v3: reformat the commit log, remove duplicate content v2: modify the coredump backtrace within commit log with the newest qemu with master branch Currently, if guest has workloads, IO thread will acquire aio_context lock before do io_submit, it leads to segmentfault when do block commit after snapshot. Just like below: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7f7c7d91f700 (LWP 99907)] 0x00005576d0f65aab in bdrv_mirror_top_pwritev at ../block/mirror.c:1437 1437 ../block/mirror.c: No such file or directory. (gdb) p s->job $17 = (MirrorBlockJob *) 0x0 (gdb) p s->stop $18 = false (gdb) bt Switch to qemu main thread: /lib/../lib64/libpthread.so.0 /lib/../lib64/libpthread.so.0 ../util/qemu-thread-posix.c:79 qapi/qapi-commands-block-core.c:346 ../qapi/qmp-dispatch.c:110 /lib/../lib64/libglib-2.0.so.0 In IO thread when do bdrv_mirror_top_pwritev, the job is NULL, and stop field is false, this means the MirrorBDSOpaque "s" object has not been initialized yet, and this object is initialized by block_job_create(), but the initialize process is stuck in acquiring the lock. The rootcause is that qemu do release/acquire when hold the lock, at the same time, IO thread get the lock after release stage, and the crash occured. Actually, in this situation, job->job.aio_context will not equal to qemu_get_aio_context(), and will be the same as bs->aio_context, thus, no need to release the lock, becasue bdrv_root_attach_child() will not change the context. This patch fix this issue. Signed-off-by: Michael Qiu <qiud...@huayun.com> --- blockjob.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/blockjob.c b/blockjob.c index 98ac8af982..51a09f3b60 100644 --- a/blockjob.c +++ b/blockjob.c @@ -214,13 +214,15 @@ int block_job_add_bdrv(BlockJob *job, const char *name, BlockDriverState *bs, BdrvChild *c; bdrv_ref(bs); - if (job->job.aio_context != qemu_get_aio_context()) { + if (bdrv_get_aio_context(bs) != job->job.aio_context && + job->job.aio_context != qemu_get_aio_context()) { aio_context_release(job->job.aio_context); } c = bdrv_root_attach_child(bs, name, &child_job, 0, job->job.aio_context, perm, shared_perm, job, errp); - if (job->job.aio_context != qemu_get_aio_context()) { + if (bdrv_get_aio_context(bs) != job->job.aio_context && + job->job.aio_context != qemu_get_aio_context()) { aio_context_acquire(job->job.aio_context); } if (c == NULL) { -- 2.22.0