Am 23.07.2015 um 08:47 hat Markus Armbruster geschrieben: > Alberto Garcia <be...@igalia.com> writes: > > > I've been debugging a couple of problems related to the recently > > merged bdrv_reopen() overhaul code. > > > > 1. bs->children is not updated correctly > > ---------------------------------------- > > The problem is described in this e-mail: > > > > https://lists.gnu.org/archive/html/qemu-devel/2015-06/msg06813.html > > > > In short, changing an image's backing hd does not replace the pointer > > in the bs->children list. > > > > The children list is a feature added recently in 6e93e7c41fdfdee30. > > In addition to bs->backing_hd and bs->file it also includes other > > driver-specific children for cases like Quorum. > > I was involved in discussions leading up to the children list, but I > couldn't find the time to review actual patches, so I only have > high-level remarks at this time. > > Obvious: the BDS graph is linked together with pointers. > > Two pointers are special: BDS members @backing_hd and @file. By > convention: > > 1. COW drivers use @file for the delta and @backing_hd for the base. > > Example: "qcow2" uses them that way. > > 2. Non-COW drivers don't use @backing_hd (it remains null), and they may > use @file. > > Example: "raw" uses @file to point to its underlying BDS. > Example: "file" doesn't use @file (it remains null). > > 3. If a driver needs children that don't fit the above, it has suitable > members in its private state. > > Example "blkverify" has a member @test_file. > > Due to 3., generic code cannot find all children. This is unfortunate. > > Possible solutions: > > A. Driver callbacks to help iterate over children. > > B. Change the data structure so that generic code can iterate over > children. > > C. Augment the data structure so that generic code can iterate over > children. > > Solution A seems relatively straightforward to do, but clumsy. B is a > lot of code churn, and the result might be less clear, say bs->child[0] > and bs->child[1] instead of bs->file and bs->backing_hd. C begs > consistency questions. > > Our children list is C. How do we ensure consistency?
By doing B. > One way to make ensuring it easier is another level of indirection: have > the children list nodes point to the child pointer instead of the child. > Then we need to mess with the children list only initially, and when > child pointers get born or die. My series does it the other way round: bs->backing_file becomes a pointer to BdrvChild. > Without that, we need to mess with it when child pointers change. We > can require all changes to go through a helper function that updates the > children list. Perhaps something like > > void bdrv_set_child_internal(BlockDriverState *bs, size_t offs, > BlockDriverState *new_child) > { > BlockDriverState **child_ptr = (BlockDriverState **)((char *)bs + > offs); > > if (*child_ptr) { > remove *child_ptr from bs->children > } > *child_ptr = new_child > if (new_child) { > add *child_ptr to bs->children > } > } > > #define bdrv_set_child(bs, member, new_child) \ > bdrv_set_child_internal(bs, offsetof(bs, member), new_child) > // build time assertion bs->member is of type BlockDriverState * omitted > > Caution: when the same child occurs multiple times, this may destroy any > association of "actual child pointer" with "position in child list". This looks much too error-prone to me. We have to avoid duplication. > Aside: why is it a children list and not an array? Are members deleted > or inserted that often? Not often, but not all at the same time either. Implementation detail, feel free to convert it to reallocating an array each time. > > 2. bdrv_reopen_queue() includes the complete backing chain > > ---------------------------------------------------------- > > Calling bdrv_reopen() on a particular BlockDriverState now adds its > > whole backing chain to the queue (formerly I think it was just > > bs->file). > > I'm not sure I understand. Can you give an example with before and > after behavior? Backing chain: base <- snap Start with cache=none set for snap. base inherits the option. Reopen with cache=writeback. Before the change, base remained at cache=none, afterwards it inherits the new setting cache=writeback. As we defined that reopen should adjust the inherited options of child nodes accordingly, this was a bug fix. Kevin