Re: [Qemu-devel] [PATCH] block/mirror: change the semantic of 'force' of block-job-cancel

Liang Li Wed, 31 Jan 2018 18:20:13 -0800

On Tue, Jan 30, 2018 at 03:18:31PM -0500, John Snow wrote:
> 
> 
> On 01/30/2018 03:38 AM, Liang Li wrote:
>> When doing drive mirror to a low speed shared storage, if there was heavy
>> BLK IO write workload in VM after the 'ready' event, drive mirror block job
>> can't be canceled immediately, it would keep running until the heavy BLK IO
>> workload stopped in the VM.
>> 
>> Because libvirt depends on block-job-cancel for block live migration, the
>> current block-job-cancel has the semantic to make sure data is in sync after
>> the 'ready' event.  This semantic can't meet some requirement, for example,
>> people may use drive mirror for realtime backup while need the ability of
>> block live migration. If drive mirror can't not be cancelled immediately,
>> it means block live migration need to wait, because libvirt make use drive
>> mirror to implement block live migration and only one drive mirror block
>> job is allowed at the same time for a give block dev.
>> 
>> We need a new interface for 'force cancel', which could quit block job
>> immediately if don't care about whether data is in sync or not.
>> 
>> 'force' is not used by libvirt currently, to make things simple, change
>> it's semantic slightly, hope it will not break some use case which need its
>> original semantic.
>> 
>> Cc: Paolo Bonzini <pbonz...@redhat.com>
>> Cc: Jeff Cody <jc...@redhat.com>
>> Cc: Kevin Wolf <kw...@redhat.com>
>> Cc: Max Reitz <mre...@redhat.com>
>> Cc: Eric Blake <ebl...@redhat.com>
>> Cc: John Snow <js...@redhat.com>
>> Reported-by: Huaitong Han <huanhuait...@didichuxing.com>
>> Signed-off-by: Huaitong Han <huanhuait...@didichuxing.com>
>> Signed-off-by: Liang Li <liliang...@didichuxing.com>
>> ---
>> block/mirror.c            |  9 +++------
>> blockdev.c                |  4 ++--
>> blockjob.c                | 11 ++++++-----
>> hmp-commands.hx           |  3 ++-
>> include/block/blockjob.h  |  9 ++++++++-
>> qapi/block-core.json      |  6 ++++--
>> tests/test-blockjob-txn.c |  8 ++++----
>> 7 files changed, 29 insertions(+), 21 deletions(-)
>> 
>> diff --git a/block/mirror.c b/block/mirror.c
>> index c9badc1..c22dff9 100644
>> --- a/block/mirror.c
>> +++ b/block/mirror.c
>> @@ -869,11 +869,8 @@ static void coroutine_fn mirror_run(void *opaque)
>> 
>>         ret = 0;
>>         trace_mirror_before_sleep(s, cnt, s->synced, delay_ns);
>> -        if (!s->synced) {
>> -            block_job_sleep_ns(&s->common, delay_ns);
>> -            if (block_job_is_cancelled(&s->common)) {
>> -                break;
>> -            }
>> +        if (block_job_is_cancelled(&s->common) && s->common.force) {
>> +            break;
> 
> what's the justification for removing the sleep in the case that
> !s->synced && !block_job_is_cancelled(...) ?
> 
if !block_job_is_cancelled() satisfied, the code in 'if (!should_complete) {}'
will execute, there is a block_job_sleep_ns there.


block_job_sleep_ns is for rate throttling, if there is no more data to sync, 
sleep is not needed, right?

>>         } else if (!should_complete) {
>>             delay_ns = (s->in_flight == 0 && cnt == 0 ? SLICE_TIME : 0);
>>             block_job_sleep_ns(&s->common, delay_ns);
>> @@ -887,7 +884,7 @@ immediate_exit:
>>          * or it was cancelled prematurely so that we do not guarantee that
>>          * the target is a copy of the source.
>>          */
>> -        assert(ret < 0 || (!s->synced && 
>> block_job_is_cancelled(&s->common)));
>> +        assert(ret < 0 || block_job_is_cancelled(&s->common));
> 
> This assertion gets weaker in the case where force isn't provided, is
> that desired?
> 
yes. if force quit is used, the following condition can be true

(ret >= 0) && (s->synced) && (block_job_is_cancelled(&s->common)) 

so the above assert should be changed, or it will be failed.

>>         assert(need_drain);
>>         mirror_wait_for_all_io(s);
>>     }
>> diff --git a/blockdev.c b/blockdev.c
>> index 8e977ee..039f156 100644
>> --- a/blockdev.c
>> +++ b/blockdev.c
>> @@ -145,7 +145,7 @@ void blockdev_mark_auto_del(BlockBackend *blk)
>>         aio_context_acquire(aio_context);
>> 
>>         if (bs->job) {
>> -            block_job_cancel(bs->job);
>> +            block_job_cancel(bs->job, false);
>>         }
>> 
>>         aio_context_release(aio_context);
>> @@ -3802,7 +3802,7 @@ void qmp_block_job_cancel(const char *device,
>>     }
>> 
>>     trace_qmp_block_job_cancel(job);
>> -    block_job_cancel(job);
>> +    block_job_cancel(job, force);
>> out:
>>     aio_context_release(aio_context);
>> }
>> diff --git a/blockjob.c b/blockjob.c
>> index f5cea84..0aacb50 100644
>> --- a/blockjob.c
>> +++ b/blockjob.c
>> @@ -365,7 +365,7 @@ static void block_job_completed_single(BlockJob *job)
>>     block_job_unref(job);
>> }
>> 
>> -static void block_job_cancel_async(BlockJob *job)
>> +static void block_job_cancel_async(BlockJob *job, bool force)
>> {
>>     if (job->iostatus != BLOCK_DEVICE_IO_STATUS_OK) {
>>         block_job_iostatus_reset(job);
>> @@ -376,6 +376,7 @@ static void block_job_cancel_async(BlockJob *job)
>>         job->pause_count--;
>>     }
>>     job->cancelled = true;
>> +    job->force = force;
>> }
>> 
> 
> I suppose this is okay; we're setting a permanent property of the job as
> part of a limited operation.
> 
> Granted, the job should disappear afterwards, so it should never
> conflict, but it made me wonder for a moment.
> 
> What happens if we cancel with force = true and then immediately cancel
> again with force = false? What happens? Can it cause issues?
> 

Indeed. It can be fixed by:

if (!job->force)
   job->force = force

it's that ok ?

>> static int block_job_finish_sync(BlockJob *job,
>> @@ -437,7 +438,7 @@ static void block_job_completed_txn_abort(BlockJob *job)
>>      * on the caller, so leave it. */
>>     QLIST_FOREACH(other_job, &txn->jobs, txn_list) {
>>         if (other_job != job) {
>> -            block_job_cancel_async(other_job);
>> +            block_job_cancel_async(other_job, true);
> 
> What's the rationale for deciding that transactional cancellations are
> always force=true?
> 

no particular reason, just try to speed up the abort process. :)

> Granted, this doesn't matter currently because mirror isn't a
> transactional job, but that makes the decision all the more curious.
> 
>>         }
>>     }

Thanks for your comments.

Liang

Re: [Qemu-devel] [PATCH] block/mirror: change the semantic of 'force' of block-job-cancel

Reply via email to