Thanks for your patch! The issue can be reproduced as follows: $ qemu-img create -f qcow2 -b \ "json:{'driver':'raw','file':{ 'driver':'blkdebug','inject-error':[{'event':'write_aio'}], 'image':{'driver':'null-co'}}}" \ overlay.qcow2 $ qemu-io -c 'write 0 64k' overlay.qcow2 $ qemu-img commit overlay.qcow2
While your patch fixes that issue, I still have some comments: On 2017-06-14 08:22, sochin.jiang wrote: > From: "sochin.jiang" <sochin.ji...@huawei.com> > > img_commit could fall into infinite loop if it's blockjob This should be "into an infinite loop" and "its" instead if "it's". > This empty line should be omitted. > fail encountering any I/O error. Try to fix it. Should be "fails on any I/O error" or "fails on encountering any I/O error". Also, you're not trying to fix it but let's all hope you really are fixing it. :-) (So "Fix it." instead of "Try to fix it.") > > Signed-off-by: sochin.jiang <sochin.ji...@huawei.com> > --- > qemu-img.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/qemu-img.c b/qemu-img.c > index 0ad698d..6ba565d 100644 > --- a/qemu-img.c > +++ b/qemu-img.c > @@ -895,8 +895,11 @@ static void run_block_job(BlockJob *job, Error **errp) > aio_poll(aio_context, true); > qemu_progress_print(job->len ? > ((float)job->offset / job->len * 100.f) : 0.0f, > 0); > - } while (!job->ready); > + } while (!job->ready && !job->ret); I think it would be better to test job->completed instead of job->ret. > > + if (job->ret) { > + return; > + } We shouldn't just return here but still do all the deinitialization like call aio_context_release(). I guess the best would be to just skip the block_job_complete_sync() call if job->completed is true. > block_job_complete_sync(job, errp); > aio_context_release(aio_context); Then, there are three more issues I found while reviewing this patch: First, if the block job is completed before block_job_complete_sync() is called (i.e. if an error occurred), it is automatically freed. This is bad because this means we'll have some instances of use-after-free here. Therefore, we need to invoke block_job_ref() before run_block_job() and block_job_unref() afterwards. (And since these functions are currenctly static in blockjob.c, we'll have to make them global.) Secondly, run_block_job() doesn't evaluate job->ret. Therefore it will report success even if the commit failed (it is expecting block_job_complete_sync() to put an error into errp, but it will not do that). So we'll have to do that (manually check job->ret and if it's negative, put an error message into errp; also, assert that job->cancelled is false). Thirdly, we have segfault in bdrv_reopen_prepare() if the image has non-string options... I'll handle this one. I can also handle the other two issues, if you'd like me to. Finally, an iotest would be nice (see my reproducer above). But I can handle that as well, if you decide not to write one. Max
signature.asc
Description: OpenPGP digital signature