On 22.07.19 15:30, Max Reitz wrote:
> Hi,
> 
> I noted that test-bdrv-drain sometimgs hangs (very rarely, though), and
> tried to write a test that triggers the issue.  I failed to do so (there
> is a good reason for that, see patch 1), but on my way I noticed that
> calling bdrv_set_aio_context_ignore() from any AioContext but the main
> one is a bad idea.  Hence patch 2.
> 
> Anyway, I found the problem, which is fixed by patch 1 -- I think it’s
> rather obvious.  There is no dedicated test because I don’t think it’s
> possible to write one, as I explain there.
> 
> 
> Max Reitz (2):
>   block: Dec. drained_end_counter before bdrv_wakeup
>   block: Only the main loop can change AioContexts
> 
>  include/block/block.h |  8 +++-----
>  block.c               | 13 ++++++++-----
>  block/io.c            |  5 ++---
>  3 files changed, 13 insertions(+), 13 deletions(-)

Sorry, applied to my block branch.


“Sorry” obviously because I didn’t really give much time to review this
series.  My justification is:

- rc2’s tomorrow.  I know the current code is broken, so I’d rather take
  the chance to have a fixed rc2 than to know it to be broken and force
  an rc4 by sending this for rc3.

- Patch 1 looks really obvious and simple to me.  It makes sense to
  decrement the drained_end_counter exactly where .done is set to true.

- Patch 2 is not so obvious.  But the only dangerous change it
  introduces is an assertion that bdrv_set_aio_context_ignore() is
  called from the main loop.  Right now, if it is called from anywhere
  but the main loop, we *will* run into assertions elsewhere:
  - bdrv_drained_begin() does a BDRV_POLL_WHILE(bs, ...).  This only
    works from the main context or bs's context.
  - bdrv_drained_end() does the same, but now bs's context has changed.
  Ergo, right now (on master), you can run bdrv_set_aio_context_ignore()
  only from the main loop anyway.  The new assertion only makes that
  fact more explicit.

Now this makes it look like before e037c09c78520, you could run
bdrv_set_aio_context_ignore() from the old context (like the comment
therein said).  But that’s wrong.  Even before e037c09c78520,
bdrv_drained_end() didn’t work for mixed-context trees unless you call
it from the main context: It schedules the drained_end callbacks in the
respective node's context, but then it polls them from the original
context.  So if you modify e.g. test_set_aio_context() in
test-bdrv-drain to run bdrv_try_set_aio_context() from a BH in the old
context, you will see a failed assertion in bdrv_drain_invoke()'s
BDRV_POLL_WHILE.  (I’ve attached a diff for use on e037c09c78520^.)

Therefore, I’m confident this series doesn’t make things worse, which is
why I’m taking it without reviews.

Max
diff --git a/tests/test-bdrv-drain.c b/tests/test-bdrv-drain.c
index 3503ce3b69..cf1194754e 100644
--- a/tests/test-bdrv-drain.c
+++ b/tests/test-bdrv-drain.c
@@ -1497,6 +1497,19 @@ static void test_append_to_drained(void)
     blk_unref(blk);
 }
 
+typedef struct BHParams {
+    BlockDriverState *bs;
+    AioContext *target;
+    bool done;
+} BHParams;
+
+static void bh_fun(void *opaque)
+{
+    BHParams *bhp = opaque;
+    bdrv_try_set_aio_context(bhp->bs, bhp->target, &error_abort);
+    bhp->done = true;
+}
+
 static void test_set_aio_context(void)
 {
     BlockDriverState *bs;
@@ -1504,22 +1517,38 @@ static void test_set_aio_context(void)
     IOThread *b = iothread_new();
     AioContext *ctx_a = iothread_get_aio_context(a);
     AioContext *ctx_b = iothread_get_aio_context(b);
+    BHParams bhp;
 
     bs = bdrv_new_open_driver(&bdrv_test, "test-node", BDRV_O_RDWR,
                               &error_abort);
 
     bdrv_drained_begin(bs);
-    bdrv_try_set_aio_context(bs, ctx_a, &error_abort);
+    bhp = (BHParams){ bs, ctx_a };
+    aio_bh_schedule_oneshot(qemu_get_aio_context(), bh_fun, &bhp);
+    while (!bhp.done) {
+        aio_poll(qemu_get_aio_context(), true);
+    }
 
     aio_context_acquire(ctx_a);
     bdrv_drained_end(bs);
 
     bdrv_drained_begin(bs);
-    bdrv_try_set_aio_context(bs, ctx_b, &error_abort);
+
+    bhp = (BHParams){ bs, ctx_b };
+    aio_bh_schedule_oneshot(ctx_a, bh_fun, &bhp);
     aio_context_release(ctx_a);
+    while (!bhp.done) {
+        aio_poll(qemu_get_aio_context(), true);
+    }
+
     aio_context_acquire(ctx_b);
-    bdrv_try_set_aio_context(bs, qemu_get_aio_context(), &error_abort);
+    bhp = (BHParams){ bs, qemu_get_aio_context() };
+    aio_bh_schedule_oneshot(ctx_b, bh_fun, &bhp);
     aio_context_release(ctx_b);
+    while (!bhp.done) {
+        aio_poll(qemu_get_aio_context(), true);
+    }
+
     bdrv_drained_end(bs);
 
     bdrv_unref(bs);

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to