Re: [Qemu-devel] [PATCH v3] blockjob: drain all job nodes in block_job_drain

Max Reitz Thu, 01 Aug 2019 12:45:27 -0700

On 30.07.19 21:11, John Snow wrote:
> 
> 
> On 7/24/19 5:40 AM, Vladimir Sementsov-Ogievskiy wrote:
>> Instead of draining additional nodes in each job code, let's do it in
>> common block_job_drain, draining just all job's children.
>> BlockJobDriver.drain becomes unused, so, drop it at all.
>>
>> It's also a first step to finally get rid of blockjob->blk.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsement...@virtuozzo.com>
>> ---
>>
>> v3: just resend, as I've some auto returned mails and not sure that
>>     v2 reached recipients.
>>
>> v2: apply Max's suggestions:
>>  - drop BlockJobDriver.drain
>>  - do firtly loop of bdrv_drained_begin and then separate loop
>>    of bdrv_drained_end.
>>
>>    Hmm, a question here: should I call bdrv_drained_end in reverse
>>    order? Or it's OK as is?
>>
> 
> I think it should be OK. These nodes don't necessarily have a well
> defined relationship between each other, do they?


It’s OK.  If they do have a relationship, the drain code sorts it out by
itself anyway.

[...]

> Seems much nicer to me. What becomes of the ref/unref pairs?
> 
> I guess not needed anymore?, since job cleanup necessarily happens in
> the main loop context now and we don't have a backup_complete function
> anymore ...?
> 
> In the cases where auto_finalize=true, do we have any guarantee that the
> completion callbacks cannot be scheduled while we are here?

Let me try to figure this out...

The only caller of block_job_drain() is job_drain(), whose only caller
is job_finish_sync() in an AIO_WAIT_WHILE() loop.  That loop can only
work in the main loop, so I suppose it does run in the main loop (and
consequentially, block_job_drain() runs in the main loop, too).

That AIO_WAIT_WHILE() loop drains the job (so all nodes) and waits until
the job has completed.  To me that looks like it is designed to have the
completion callback run at some point...?

I suppose anything like that is scheduled by job_enter() in job_drain(),
namely the aio_co_enter() in there.

(A) If the job runs in the main AioContext, aio_co_enter() will enter
the coroutine if we do not run in a coroutine right now (safe), or it
will schedule it for a later point if we do run in a coroutine.  That
latter part may be unsafe, because I guess some coroutine work in
bdrv_drained_begin() or bdrv_drained_end() may wake it up, which can
then mess everything up.

(B) If the job runs in another context, aio_co_enter() will schedule the
job immediately, which I guess means that it can run at any point?  So
it could complete at any point, including block_job_drain().  Ah, no,
actually.  AIO_WAIT_WHILE() will have the job’s context acquired while
evaluating the condition (that is, when calling block_job_drain()).  So
this is safe.


So, well.  Unless Vladimir can give you a guarantee why e.g.
block_job_remove_all_bdrv() won’t run during e.g. bdrv_drained_begin(),
I suppose you’re right and the node list needs to be copied at the
beginning of this function and all nodes should be ref’d.

Max

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v3] blockjob: drain all job nodes in block_job_drain

Reply via email to