Re: [Qemu-devel] [PATCH 0/1] block: Workaround for the iotests errors

Jeff Cody Mon, 27 Nov 2017 21:44:45 -0800

On Tue, Nov 28, 2017 at 12:29:09AM +0100, Kevin Wolf wrote:
> Am 23.11.2017 um 18:57 hat Fam Zheng geschrieben:
> > Jeff's block job patch made the latent drain bug visible, and I find this
> > patch, which by itself also makes some sense, can hide it again. :) With it
> > applied we are at least back to the ground where patchew's iotests (make
> > docker-test-block@fedora) can pass.
> > 
> > The real bug is that in the middle of bdrv_parent_drained_end(), bs's parent
> > list changes. One drained_end call before the mirror_exit() already did one
> > blk_root_drained_end(), a second drained_end on an updated parent node can 
> > do
> > another same blk_root_drained_end(), making it unbalanced with
> > blk_root_drained_begin(). This is shown by the following three backtraces as
> > captured by rr with a crashed "qemu-img commit", essentially the same as in
> > the failed iotest 020:
> 
> My conclusion what really happens in 020 is that we have a graph like
> this:
> 
>                              mirror target BB --+
>                                                 |
>                                                 v
>     qemu-img BB -> mirror_top_bs -> overlay -> base
> 
> bdrv_drained_end(base) results in it being available for requests again,
> so it calls bdrv_parent_drained_end() for overlay. While draining
> itself, the mirror job completes and changes the BdrvChild between
> mirror_top_bs and overlay (which is currently being drained) to point to
> base instead. After returning, QLIST_FOREACH() continues to iterate the
> parents of base instead of those of overlay, resulting in a second
> blk_drained_end() for the mirror target BB.
> 
> This instance can be fixed relatively easily (see below) by using
> QLIST_FOREACH_SAFE() instead.
> 
> However, I'm not sure if all problems with the graph change can be
> solved this way and whether we can really allow graph changes while
> iterating the graph for bdrv_drained_begin/end. Not allowing it would
> require some more serious changes to the block jobs that delays their
> completion until after bdrv_drain_end() has finished (not sure how to
> even get a callback at that point...)
>


That is at least part of what is causing the segfaults that I am still
seeing (after your patch):

We enter bdrv_drain_recurse(), and the BDS has been reaped:


Thread 1 "qemu-img" received signal SIGSEGV, Segmentation fault.
0x000000010014b56e in qemu_mutex_unlock (mutex=0x76767676767676d6) at 
util/qemu-thread-posix.c:92
92          assert(mutex->initialized);

#0  0x000000010014b56e in qemu_mutex_unlock (mutex=0x76767676767676d6) at 
util/qemu-thread-posix.c:92
#1  0x00000001001450bf in aio_context_release (ctx=0x7676767676767676) at 
util/async.c:507
#2  0x000000010009d5c7 in bdrv_drain_recurse (bs=0x100843270, begin=false) at 
block/io.c:201
#3  0x000000010009d949 in bdrv_drained_end (bs=0x100843270) at block/io.c:297
#4  0x000000010002e705 in bdrv_child_cb_drained_end (child=0x100870c40) at 
block.c:822
#5  0x000000010009cdc6 in bdrv_parent_drained_end (bs=0x100863b10) at 
block/io.c:60
#6  0x000000010009d938 in bdrv_drained_end (bs=0x100863b10) at block/io.c:296
#7  0x000000010009d9d5 in bdrv_drain (bs=0x100863b10) at block/io.c:322
#8  0x000000010008c39c in blk_drain (blk=0x100881220) at 
block/block-backend.c:1523
#9  0x000000010009a85e in mirror_drain (job=0x10088df50) at block/mirror.c:996
#10 0x0000000100037eca in block_job_drain (job=0x10088df50) at blockjob.c:187
#11 0x000000010003856d in block_job_finish_sync (job=0x10088df50, 
finish=0x100038926 <block_job_complete>, errp=0x7fffffffd1d8) at blockjob.c:378
#12 0x0000000100038b80 in block_job_complete_sync (job=0x10088df50, 
errp=0x7fffffffd1d8) at blockjob.c:544
#13 0x0000000100023de5 in run_block_job (job=0x10088df50, errp=0x7fffffffd1d8) 
at qemu-img.c:872
#14 0x0000000100024305 in img_commit (argc=3, argv=0x7fffffffd390) at 
qemu-img.c:1034
#15 0x000000010002ccd1 in main (argc=3, argv=0x7fffffffd390) at qemu-img.c:4763


(gdb) f 2
#2  0x000000010009d5c7 in bdrv_drain_recurse (bs=0x100843270, begin=false) at 
block/io.c:201
201         waited = BDRV_POLL_WHILE(bs, atomic_read(&bs->in_flight) > 0);
(gdb) list
196
197         /* Ensure any pending metadata writes are submitted to bs->file.  */
198         bdrv_drain_invoke(bs, begin);
199
200         /* Wait for drained requests to finish */
201         waited = BDRV_POLL_WHILE(bs, atomic_read(&bs->in_flight) > 0);
202
203         QLIST_FOREACH_SAFE(child, &bs->children, next, tmp) {
204             BlockDriverState *bs = child->bs;
205             bool in_main_loop =


(gdb) p *bs
$1 = {
  open_flags = 8835232, 
  read_only = true, 
  encrypted = false, 
  sg = false, 
  probed = false, 
  force_share = 56, 
  implicit = 59, 
  drv = 0x0, 
  opaque = 0x0, 
  aio_context = 0x7676767676767676, 
  aio_notifiers = {
    lh_first = 0x7676767676767676
  }, 
  [...]




> And the test cases that Jeff mentions still fail with this patch, too.
> But at least it doesn't only make failure less likely by reducing the
> window for a race condition, but seems to attack a real problem.
> 
> Kevin
> 
> 
> diff --git a/block/io.c b/block/io.c
> index 4fdf93a014..6773926fc1 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -42,9 +42,9 @@ static int coroutine_fn 
> bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
>  
>  void bdrv_parent_drained_begin(BlockDriverState *bs)
>  {
> -    BdrvChild *c;
> +    BdrvChild *c, *next;
>  
> -    QLIST_FOREACH(c, &bs->parents, next_parent) {
> +    QLIST_FOREACH_SAFE(c, &bs->parents, next_parent, next) {
>          if (c->role->drained_begin) {
>              c->role->drained_begin(c);
>          }
> @@ -53,9 +53,9 @@ void bdrv_parent_drained_begin(BlockDriverState *bs)
>  
>  void bdrv_parent_drained_end(BlockDriverState *bs)
>  {
> -    BdrvChild *c;
> +    BdrvChild *c, *next;
>  
> -    QLIST_FOREACH(c, &bs->parents, next_parent) {
> +    QLIST_FOREACH_SAFE(c, &bs->parents, next_parent, next) {
>          if (c->role->drained_end) {
>              c->role->drained_end(c);
>          }
>

Re: [Qemu-devel] [PATCH 0/1] block: Workaround for the iotests errors

Reply via email to