ping ping ping ping!!!! On 09.01.2019 11:18, Denis Plotnikov wrote: > ping ping!!! > > On 18.12.2018 11:53, Denis Plotnikov wrote: >> ping ping >> >> On 14.12.2018 14:54, Denis Plotnikov wrote: >>> >>> >>> On 13.12.2018 15:20, Kevin Wolf wrote: >>>> Am 13.12.2018 um 12:07 hat Denis Plotnikov geschrieben: >>>>> On 12.12.2018 15:24, Kevin Wolf wrote: >>>>>> Am 11.12.2018 um 17:55 hat Denis Plotnikov geschrieben: >>>>>>>> Why involve the AioContext at all? This could all be kept at the >>>>>>>> BlockBackend level without extending the layering violation that >>>>>>>> aio_disable_external() is. >>>>>>>> >>>>>>>> BlockBackends get notified when their root node is drained, so >>>>>>>> hooking >>>>>>>> things up there should be as easy, if not even easier than in >>>>>>>> AioContext. >>>>>>> >>>>>>> Just want to make sure that I understood correctly what you meant by >>>>>>> "BlockBackends get notified". Did you mean that bdrv_drain_end calls >>>>>>> child's role callback blk_root_drained_end by calling >>>>>>> bdrv_parent_drained_end? >>>>>> >>>>>> Yes, blk_root_drained_begin/end calls are all you need. Specifically, >>>>>> their adjustments to blk->quiesce_counter that are already there, >>>>>> and in >>>>>> the 'if (--blk->quiesce_counter == 0)' block of >>>>>> blk_root_drained_end() >>>>>> we can resume the queued requests. >>>>> Sounds it should be so, but it doesn't work that way and that's why: >>>>> when doing mirror we may resume postponed coroutines too early when >>>>> the >>>>> underlying bs is protected from writing at and thus we encounter the >>>>> assert on a write request execution at bdrv_co_write_req_prepare when >>>>> resuming the postponed coroutines. >>>>> >>>>> The thing is that the bs is protected for writing before execution of >>>>> bdrv_replace_node at mirror_exit_common and bdrv_replace_node calls >>>>> bdrv_replace_child_noperm which, in turn, calls >>>>> child->role->drained_end >>>>> where one of the callbacks is blk_root_drained_end which check >>>>> if(--blk->quiesce_counter == 0) and runs the postponed requests >>>>> (coroutines) if the coundition is true. >>>> >>>> Hm, so something is messed up with the drain sections in the mirror >>>> driver. We have: >>>> >>>> bdrv_drained_begin(target_bs); >>>> bdrv_replace_node(to_replace, target_bs, &local_err); >>>> bdrv_drained_end(target_bs); >>>> >>>> Obviously, the intention was to keep the BlockBackend drained during >>>> bdrv_replace_node(). So how could blk->quiesce_counter ever get to 0 >>>> inside bdrv_replace_node() when target_bs is drained? >>>> >>>> Looking at bdrv_replace_child_noperm(), it seems that the function has >>>> a bug: Even if old_bs and new_bs are both drained, the quiesce_counter >>>> for the parent reaches 0 for a moment because we call .drained_end for >>>> the old child first and .drained_begin for the new one later. >>>> >>>> So it seems the fix would be to reverse the order and first call >>>> .drained_begin for the new child and then .drained_end for the old >>>> child. Sounds like a good new testcase for tests/test-bdrv-drain.c, >>>> too. >>> Yes, it's true, but it's not enough... >>> In mirror_exit_common() we actively manipulate with block driver states. >>> When we replaced a node in the snippet you showed we can't allow the >>> postponed coroutines to run because the block tree isn't ready to >>> receive the requests yet. >>> To be ready, we need to insert a proper block driver state to the >>> block backend which is done here >>> >>> blk_remove_bs(bjob->blk); >>> blk_set_perm(bjob->blk, 0, BLK_PERM_ALL, &error_abort); >>> blk_insert_bs(bjob->blk, mirror_top_bs, &error_abort); << << << << >>> >>> bs_opaque->job = NULL; >>> >>> bdrv_drained_end(src); >>> >>> If the tree isn't ready and we resume the coroutines, we'll end up >>> with the request landed in a wrong block driver state. >>> >>> So, we explicitly should stop all activities on all the driver states >>> and its parents and allow the activities when everything is ready to go. >>> >>> Why explicitly, because the block driver states may belong to >>> different block backends at the moment of the manipulation beginning. >>> >>> So, it seems we need to disable all their contexts until the >>> manipulation ends. >>> >>> Please, correct me if I'm wrong. >>> >>>> >>>>> In seems that if the external requests disabled on the context we >>>>> can't >>>>> rely on anything or should check where the underlying bs and its >>>>> underlying nodes are ready to receive requests which sounds quite >>>>> complicated. >>>>> Please correct me if still don't understand something in that routine. >>>> >>>> I think the reason why reyling on aio_disable_external() works is >>>> simply >>>> because src is also drained, which keeps external events in the >>>> AioContext disabled despite the bug in draining the target node. >>>> >>>> The bug would become apparent even with aio_disable_external() if we >>>> didn't drain src, or even if we just supported src and target being in >>>> different AioContexts. >>> >>> Why don't we disable all those contexts involved until the end of the >>> block device tree reconstruction? >>> >>> Thanks! >>> >>> Denis >>>> >>>> Kevin >>>> >>> >> >
-- Best, Denis