01.08.2019 22:44, Max Reitz wrote: > On 30.07.19 21:11, John Snow wrote: >> >> >> On 7/24/19 5:40 AM, Vladimir Sementsov-Ogievskiy wrote: >>> Instead of draining additional nodes in each job code, let's do it in >>> common block_job_drain, draining just all job's children. >>> BlockJobDriver.drain becomes unused, so, drop it at all. >>> >>> It's also a first step to finally get rid of blockjob->blk. >>> >>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsement...@virtuozzo.com> >>> --- >>> >>> v3: just resend, as I've some auto returned mails and not sure that >>> v2 reached recipients. >>> >>> v2: apply Max's suggestions: >>> - drop BlockJobDriver.drain >>> - do firtly loop of bdrv_drained_begin and then separate loop >>> of bdrv_drained_end. >>> >>> Hmm, a question here: should I call bdrv_drained_end in reverse >>> order? Or it's OK as is? >>> >> >> I think it should be OK. These nodes don't necessarily have a well >> defined relationship between each other, do they? > > It’s OK. If they do have a relationship, the drain code sorts it out by > itself anyway. > > [...] > >> Seems much nicer to me. What becomes of the ref/unref pairs? >> >> I guess not needed anymore?, since job cleanup necessarily happens in >> the main loop context now and we don't have a backup_complete function >> anymore ...? >> >> In the cases where auto_finalize=true, do we have any guarantee that the >> completion callbacks cannot be scheduled while we are here? > > Let me try to figure this out... > > The only caller of block_job_drain() is job_drain(), whose only caller > is job_finish_sync() in an AIO_WAIT_WHILE() loop. That loop can only > work in the main loop, so I suppose it does run in the main loop (and > consequentially, block_job_drain() runs in the main loop, too). > > That AIO_WAIT_WHILE() loop drains the job (so all nodes) and waits until > the job has completed. To me that looks like it is designed to have the > completion callback run at some point...? > > I suppose anything like that is scheduled by job_enter() in job_drain(), > namely the aio_co_enter() in there. > > (A) If the job runs in the main AioContext, aio_co_enter() will enter > the coroutine if we do not run in a coroutine right now (safe), or it > will schedule it for a later point if we do run in a coroutine. That > latter part may be unsafe, because I guess some coroutine work in > bdrv_drained_begin() or bdrv_drained_end() may wake it up, which can > then mess everything up. > > (B) If the job runs in another context, aio_co_enter() will schedule the > job immediately, which I guess means that it can run at any point? So > it could complete at any point, including block_job_drain(). Ah, no, > actually. AIO_WAIT_WHILE() will have the job’s context acquired while > evaluating the condition (that is, when calling block_job_drain()). So > this is safe. > > > So, well. Unless Vladimir can give you a guarantee why e.g. > block_job_remove_all_bdrv() won’t run during e.g. bdrv_drained_begin(), > I suppose you’re right and the node list needs to be copied at the > beginning of this function and all nodes should be ref’d. >
Thanks a lot! OK, I'll resend -- Best regards, Vladimir