On 30.07.19 21:11, John Snow wrote: > > > On 7/24/19 5:40 AM, Vladimir Sementsov-Ogievskiy wrote: >> Instead of draining additional nodes in each job code, let's do it in >> common block_job_drain, draining just all job's children. >> BlockJobDriver.drain becomes unused, so, drop it at all. >> >> It's also a first step to finally get rid of blockjob->blk. >> >> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsement...@virtuozzo.com> >> --- >> >> v3: just resend, as I've some auto returned mails and not sure that >> v2 reached recipients. >> >> v2: apply Max's suggestions: >> - drop BlockJobDriver.drain >> - do firtly loop of bdrv_drained_begin and then separate loop >> of bdrv_drained_end. >> >> Hmm, a question here: should I call bdrv_drained_end in reverse >> order? Or it's OK as is? >> > > I think it should be OK. These nodes don't necessarily have a well > defined relationship between each other, do they?
It’s OK. If they do have a relationship, the drain code sorts it out by itself anyway. [...] > Seems much nicer to me. What becomes of the ref/unref pairs? > > I guess not needed anymore?, since job cleanup necessarily happens in > the main loop context now and we don't have a backup_complete function > anymore ...? > > In the cases where auto_finalize=true, do we have any guarantee that the > completion callbacks cannot be scheduled while we are here? Let me try to figure this out... The only caller of block_job_drain() is job_drain(), whose only caller is job_finish_sync() in an AIO_WAIT_WHILE() loop. That loop can only work in the main loop, so I suppose it does run in the main loop (and consequentially, block_job_drain() runs in the main loop, too). That AIO_WAIT_WHILE() loop drains the job (so all nodes) and waits until the job has completed. To me that looks like it is designed to have the completion callback run at some point...? I suppose anything like that is scheduled by job_enter() in job_drain(), namely the aio_co_enter() in there. (A) If the job runs in the main AioContext, aio_co_enter() will enter the coroutine if we do not run in a coroutine right now (safe), or it will schedule it for a later point if we do run in a coroutine. That latter part may be unsafe, because I guess some coroutine work in bdrv_drained_begin() or bdrv_drained_end() may wake it up, which can then mess everything up. (B) If the job runs in another context, aio_co_enter() will schedule the job immediately, which I guess means that it can run at any point? So it could complete at any point, including block_job_drain(). Ah, no, actually. AIO_WAIT_WHILE() will have the job’s context acquired while evaluating the condition (that is, when calling block_job_drain()). So this is safe. So, well. Unless Vladimir can give you a guarantee why e.g. block_job_remove_all_bdrv() won’t run during e.g. bdrv_drained_begin(), I suppose you’re right and the node list needs to be copied at the beginning of this function and all nodes should be ref’d. Max
signature.asc
Description: OpenPGP digital signature