Am 17.01.2019 um 13:57 hat Denis Plotnikov geschrieben: > Kevin, > > could you please take a look at my last comments?
I read it, and what it told me is essentially that I need to work on it myself to fully understand the problem and possible acceptable solutions because you can't seem to find one yourself. I will, but I can't guarantee when I can find the time for it. Kevin > On 15.01.2019 10:22, Denis Plotnikov wrote: > > ping ping ping ping!!!! > > > > On 09.01.2019 11:18, Denis Plotnikov wrote: > >> ping ping!!! > >> > >> On 18.12.2018 11:53, Denis Plotnikov wrote: > >>> ping ping > >>> > >>> On 14.12.2018 14:54, Denis Plotnikov wrote: > >>>> > >>>> > >>>> On 13.12.2018 15:20, Kevin Wolf wrote: > >>>>> Am 13.12.2018 um 12:07 hat Denis Plotnikov geschrieben: > >>>>>> On 12.12.2018 15:24, Kevin Wolf wrote: > >>>>>>> Am 11.12.2018 um 17:55 hat Denis Plotnikov geschrieben: > >>>>>>>>> Why involve the AioContext at all? This could all be kept at the > >>>>>>>>> BlockBackend level without extending the layering violation that > >>>>>>>>> aio_disable_external() is. > >>>>>>>>> > >>>>>>>>> BlockBackends get notified when their root node is drained, so > >>>>>>>>> hooking > >>>>>>>>> things up there should be as easy, if not even easier than in > >>>>>>>>> AioContext. > >>>>>>>> > >>>>>>>> Just want to make sure that I understood correctly what you > >>>>>>>> meant by > >>>>>>>> "BlockBackends get notified". Did you mean that bdrv_drain_end > >>>>>>>> calls > >>>>>>>> child's role callback blk_root_drained_end by calling > >>>>>>>> bdrv_parent_drained_end? > >>>>>>> > >>>>>>> Yes, blk_root_drained_begin/end calls are all you need. > >>>>>>> Specifically, > >>>>>>> their adjustments to blk->quiesce_counter that are already there, > >>>>>>> and in > >>>>>>> the 'if (--blk->quiesce_counter == 0)' block of > >>>>>>> blk_root_drained_end() > >>>>>>> we can resume the queued requests. > >>>>>> Sounds it should be so, but it doesn't work that way and that's why: > >>>>>> when doing mirror we may resume postponed coroutines too early > >>>>>> when the > >>>>>> underlying bs is protected from writing at and thus we encounter the > >>>>>> assert on a write request execution at bdrv_co_write_req_prepare when > >>>>>> resuming the postponed coroutines. > >>>>>> > >>>>>> The thing is that the bs is protected for writing before execution of > >>>>>> bdrv_replace_node at mirror_exit_common and bdrv_replace_node calls > >>>>>> bdrv_replace_child_noperm which, in turn, calls > >>>>>> child->role->drained_end > >>>>>> where one of the callbacks is blk_root_drained_end which check > >>>>>> if(--blk->quiesce_counter == 0) and runs the postponed requests > >>>>>> (coroutines) if the coundition is true. > >>>>> > >>>>> Hm, so something is messed up with the drain sections in the mirror > >>>>> driver. We have: > >>>>> > >>>>> bdrv_drained_begin(target_bs); > >>>>> bdrv_replace_node(to_replace, target_bs, &local_err); > >>>>> bdrv_drained_end(target_bs); > >>>>> > >>>>> Obviously, the intention was to keep the BlockBackend drained during > >>>>> bdrv_replace_node(). So how could blk->quiesce_counter ever get to 0 > >>>>> inside bdrv_replace_node() when target_bs is drained? > >>>>> > >>>>> Looking at bdrv_replace_child_noperm(), it seems that the function has > >>>>> a bug: Even if old_bs and new_bs are both drained, the quiesce_counter > >>>>> for the parent reaches 0 for a moment because we call .drained_end for > >>>>> the old child first and .drained_begin for the new one later. > >>>>> > >>>>> So it seems the fix would be to reverse the order and first call > >>>>> .drained_begin for the new child and then .drained_end for the old > >>>>> child. Sounds like a good new testcase for tests/test-bdrv-drain.c, > >>>>> too. > >>>> Yes, it's true, but it's not enough... > >>>> In mirror_exit_common() we actively manipulate with block driver > >>>> states. > >>>> When we replaced a node in the snippet you showed we can't allow the > >>>> postponed coroutines to run because the block tree isn't ready to > >>>> receive the requests yet. > >>>> To be ready, we need to insert a proper block driver state to the > >>>> block backend which is done here > >>>> > >>>> blk_remove_bs(bjob->blk); > >>>> blk_set_perm(bjob->blk, 0, BLK_PERM_ALL, &error_abort); > >>>> blk_insert_bs(bjob->blk, mirror_top_bs, &error_abort); << << << << > >>>> > >>>> bs_opaque->job = NULL; > >>>> > >>>> bdrv_drained_end(src); > >>>> > >>>> If the tree isn't ready and we resume the coroutines, we'll end up > >>>> with the request landed in a wrong block driver state. > >>>> > >>>> So, we explicitly should stop all activities on all the driver states > >>>> and its parents and allow the activities when everything is ready to > >>>> go. > >>>> > >>>> Why explicitly, because the block driver states may belong to > >>>> different block backends at the moment of the manipulation beginning. > >>>> > >>>> So, it seems we need to disable all their contexts until the > >>>> manipulation ends. > >>>> > >>>> Please, correct me if I'm wrong. > >>>> > >>>>> > >>>>>> In seems that if the external requests disabled on the context we > >>>>>> can't > >>>>>> rely on anything or should check where the underlying bs and its > >>>>>> underlying nodes are ready to receive requests which sounds quite > >>>>>> complicated. > >>>>>> Please correct me if still don't understand something in that > >>>>>> routine. > >>>>> > >>>>> I think the reason why reyling on aio_disable_external() works is > >>>>> simply > >>>>> because src is also drained, which keeps external events in the > >>>>> AioContext disabled despite the bug in draining the target node. > >>>>> > >>>>> The bug would become apparent even with aio_disable_external() if we > >>>>> didn't drain src, or even if we just supported src and target being in > >>>>> different AioContexts. > >>>> > >>>> Why don't we disable all those contexts involved until the end of > >>>> the block device tree reconstruction? > >>>> > >>>> Thanks! > >>>> > >>>> Denis > >>>>> > >>>>> Kevin > >>>>> > >>>> > >>> > >> > > > > -- > Best, > Denis