On 22.07.19 15:30, Max Reitz wrote: > Hi, > > I noted that test-bdrv-drain sometimgs hangs (very rarely, though), and > tried to write a test that triggers the issue. I failed to do so (there > is a good reason for that, see patch 1), but on my way I noticed that > calling bdrv_set_aio_context_ignore() from any AioContext but the main > one is a bad idea. Hence patch 2. > > Anyway, I found the problem, which is fixed by patch 1 -- I think it’s > rather obvious. There is no dedicated test because I don’t think it’s > possible to write one, as I explain there. > > > Max Reitz (2): > block: Dec. drained_end_counter before bdrv_wakeup > block: Only the main loop can change AioContexts > > include/block/block.h | 8 +++----- > block.c | 13 ++++++++----- > block/io.c | 5 ++--- > 3 files changed, 13 insertions(+), 13 deletions(-)
Sorry, applied to my block branch. “Sorry” obviously because I didn’t really give much time to review this series. My justification is: - rc2’s tomorrow. I know the current code is broken, so I’d rather take the chance to have a fixed rc2 than to know it to be broken and force an rc4 by sending this for rc3. - Patch 1 looks really obvious and simple to me. It makes sense to decrement the drained_end_counter exactly where .done is set to true. - Patch 2 is not so obvious. But the only dangerous change it introduces is an assertion that bdrv_set_aio_context_ignore() is called from the main loop. Right now, if it is called from anywhere but the main loop, we *will* run into assertions elsewhere: - bdrv_drained_begin() does a BDRV_POLL_WHILE(bs, ...). This only works from the main context or bs's context. - bdrv_drained_end() does the same, but now bs's context has changed. Ergo, right now (on master), you can run bdrv_set_aio_context_ignore() only from the main loop anyway. The new assertion only makes that fact more explicit. Now this makes it look like before e037c09c78520, you could run bdrv_set_aio_context_ignore() from the old context (like the comment therein said). But that’s wrong. Even before e037c09c78520, bdrv_drained_end() didn’t work for mixed-context trees unless you call it from the main context: It schedules the drained_end callbacks in the respective node's context, but then it polls them from the original context. So if you modify e.g. test_set_aio_context() in test-bdrv-drain to run bdrv_try_set_aio_context() from a BH in the old context, you will see a failed assertion in bdrv_drain_invoke()'s BDRV_POLL_WHILE. (I’ve attached a diff for use on e037c09c78520^.) Therefore, I’m confident this series doesn’t make things worse, which is why I’m taking it without reviews. Max
diff --git a/tests/test-bdrv-drain.c b/tests/test-bdrv-drain.c index 3503ce3b69..cf1194754e 100644 --- a/tests/test-bdrv-drain.c +++ b/tests/test-bdrv-drain.c @@ -1497,6 +1497,19 @@ static void test_append_to_drained(void) blk_unref(blk); } +typedef struct BHParams { + BlockDriverState *bs; + AioContext *target; + bool done; +} BHParams; + +static void bh_fun(void *opaque) +{ + BHParams *bhp = opaque; + bdrv_try_set_aio_context(bhp->bs, bhp->target, &error_abort); + bhp->done = true; +} + static void test_set_aio_context(void) { BlockDriverState *bs; @@ -1504,22 +1517,38 @@ static void test_set_aio_context(void) IOThread *b = iothread_new(); AioContext *ctx_a = iothread_get_aio_context(a); AioContext *ctx_b = iothread_get_aio_context(b); + BHParams bhp; bs = bdrv_new_open_driver(&bdrv_test, "test-node", BDRV_O_RDWR, &error_abort); bdrv_drained_begin(bs); - bdrv_try_set_aio_context(bs, ctx_a, &error_abort); + bhp = (BHParams){ bs, ctx_a }; + aio_bh_schedule_oneshot(qemu_get_aio_context(), bh_fun, &bhp); + while (!bhp.done) { + aio_poll(qemu_get_aio_context(), true); + } aio_context_acquire(ctx_a); bdrv_drained_end(bs); bdrv_drained_begin(bs); - bdrv_try_set_aio_context(bs, ctx_b, &error_abort); + + bhp = (BHParams){ bs, ctx_b }; + aio_bh_schedule_oneshot(ctx_a, bh_fun, &bhp); aio_context_release(ctx_a); + while (!bhp.done) { + aio_poll(qemu_get_aio_context(), true); + } + aio_context_acquire(ctx_b); - bdrv_try_set_aio_context(bs, qemu_get_aio_context(), &error_abort); + bhp = (BHParams){ bs, qemu_get_aio_context() }; + aio_bh_schedule_oneshot(ctx_b, bh_fun, &bhp); aio_context_release(ctx_b); + while (!bhp.done) { + aio_poll(qemu_get_aio_context(), true); + } + bdrv_drained_end(bs); bdrv_unref(bs);
signature.asc
Description: OpenPGP digital signature