On 11/27/2017 06:29 PM, Kevin Wolf wrote: > Am 23.11.2017 um 18:57 hat Fam Zheng geschrieben: >> Jeff's block job patch made the latent drain bug visible, and I find this >> patch, which by itself also makes some sense, can hide it again. :) With it >> applied we are at least back to the ground where patchew's iotests (make >> docker-test-block@fedora) can pass. >> >> The real bug is that in the middle of bdrv_parent_drained_end(), bs's parent >> list changes. One drained_end call before the mirror_exit() already did one >> blk_root_drained_end(), a second drained_end on an updated parent node can do >> another same blk_root_drained_end(), making it unbalanced with >> blk_root_drained_begin(). This is shown by the following three backtraces as >> captured by rr with a crashed "qemu-img commit", essentially the same as in >> the failed iotest 020: > > My conclusion what really happens in 020 is that we have a graph like > this: > > mirror target BB --+ > | > v > qemu-img BB -> mirror_top_bs -> overlay -> base > > bdrv_drained_end(base) results in it being available for requests again, > so it calls bdrv_parent_drained_end() for overlay. While draining > itself, the mirror job completes and changes the BdrvChild between > mirror_top_bs and overlay (which is currently being drained) to point to > base instead. After returning, QLIST_FOREACH() continues to iterate the > parents of base instead of those of overlay, resulting in a second > blk_drained_end() for the mirror target BB. > > This instance can be fixed relatively easily (see below) by using > QLIST_FOREACH_SAFE() instead. > > However, I'm not sure if all problems with the graph change can be > solved this way and whether we can really allow graph changes while > iterating the graph for bdrv_drained_begin/end. Not allowing it would > require some more serious changes to the block jobs that delays their > completion until after bdrv_drain_end() has finished (not sure how to > even get a callback at that point...) > > And the test cases that Jeff mentions still fail with this patch, too. > But at least it doesn't only make failure less likely by reducing the > window for a race condition, but seems to attack a real problem. > > Kevin > > > diff --git a/block/io.c b/block/io.c > index 4fdf93a014..6773926fc1 100644 > --- a/block/io.c > +++ b/block/io.c > @@ -42,9 +42,9 @@ static int coroutine_fn > bdrv_co_do_pwrite_zeroes(BlockDriverState *bs, > > void bdrv_parent_drained_begin(BlockDriverState *bs) > { > - BdrvChild *c; > + BdrvChild *c, *next; > > - QLIST_FOREACH(c, &bs->parents, next_parent) { > + QLIST_FOREACH_SAFE(c, &bs->parents, next_parent, next) { > if (c->role->drained_begin) { > c->role->drained_begin(c); > } > @@ -53,9 +53,9 @@ void bdrv_parent_drained_begin(BlockDriverState *bs) > > void bdrv_parent_drained_end(BlockDriverState *bs) > { > - BdrvChild *c; > + BdrvChild *c, *next; > > - QLIST_FOREACH(c, &bs->parents, next_parent) { > + QLIST_FOREACH_SAFE(c, &bs->parents, next_parent, next) { > if (c->role->drained_end) { > c->role->drained_end(c); > } > >
With this patch applied to 5e19aed5, I see the following failures (still?): raw: 055 (pause_job timeouts) 109 (apparently a discrepancy over whether busy should be true/false.) qcow2: 056 (hang), 087 (lacking crypto, normal for me) 141 (unexpected timeout/hang) 176 (SIGSEGV) 188 (lacking crypto, normal for me) 189 (lacking crypto, normal for me) 198 (lacking crypto, I guess this is normal now) I'm on my way to the gym for the evening, I will try to investigate later this evening. I'm not worried about 087, 188, 189 or 198. Anyway, as this micro-patchlet does fix observable problems with iotest 020; Tested-by: John Snow <js...@redhat.com>