On Wed, Apr 18, 2012 at 03:12:02PM +0200, Paolo Bonzini wrote: > The limitation on not having I/O after cancellation cannot really be > held. Even streaming has a very small race window where you could > cancel a job and have it report completion. If this window is hit, > bdrv_change_backing_file() will yield and possibly cause accesses to > dangling pointers etc. > > So, let's just assume that we cannot know exactly what will happen > after the coroutine has set busy to false. We can set a very lax > condition: > > - if we cancel the job, the coroutine won't set it to false again > (and hence will not call co_sleep_ns again). > > - block_job_cancel_sync will wait for the coroutine to exit, which > pretty much ensures no race. > > Instead, we put a very strict condition on what to do while busy = > false. We track the coroutine that executes the job and reenter it > (thus cancelling a wait for example) before block_job_cancel restarts. > Thus you cannot really do anything but co_sleep_ns at that time. > > This patch also drains the I/O *before* canceling the job, so that > block_job_cancel is quite sure to find the coroutine in quiescent > state (busy = false). For mirroring, this means that the job will > complete itself with a consistent view of the disk. > > Signed-off-by: Paolo Bonzini <pbonz...@redhat.com> > --- > block.c | 41 +++++++++++++++++++++++++++++++++++++++-- > block/stream.c | 7 +++---- > block_int.h | 17 ++++++++++++----- > 3 files changed, 54 insertions(+), 11 deletions(-) > > diff --git a/block.c b/block.c > index 9c7d896..48f4414 100644 > --- a/block.c > +++ b/block.c > @@ -4217,7 +4217,15 @@ int block_job_set_speed(BlockJob *job, int64_t value) > > void block_job_cancel(BlockJob *job) > { > + /* Complete all guest I/O before cancelling the job, so that if the > + * job chooses to complete itself it will do so with a consistent > + * view of the disk. > + */ > + bdrv_drain_all(); > job->cancelled = true; > + if (job->co && !job->busy) { > + qemu_coroutine_enter(job->co, NULL); > + } > }
block_job_cancel() is not supposed to block. Now it will wait however long it takes to complete in-flight I/O. It has become synchronous and therefore the point of having a completion/callback is gone. If there really is no asynchronous solution it would be cleaner to throw away all the cancellation/completion logic and just do it synchronously. Stefan