On Tue, Jul 28, 2015 at 11:31 AM, Stefan Hajnoczi <stefa...@gmail.com> wrote: > On Tue, Jul 28, 2015 at 11:26 AM, Cornelia Huck > <cornelia.h...@de.ibm.com> wrote: >> On Tue, 28 Jul 2015 09:34:46 +0100 >> Stefan Hajnoczi <stefa...@redhat.com> wrote: >> >>> On Tue, Jul 28, 2015 at 10:02:26AM +0200, Cornelia Huck wrote: >>> > On Tue, 28 Jul 2015 09:07:00 +0200 >>> > Cornelia Huck <cornelia.h...@de.ibm.com> wrote: >>> > >>> > > On Mon, 27 Jul 2015 17:33:37 +0100 >>> > > Stefan Hajnoczi <stefa...@redhat.com> wrote: >>> > > >>> > > > See Patch 2 for details on the deadlock after two >>> > > > aio_context_acquire() calls >>> > > > race. This caused dataplane to hang on startup. >>> > > > >>> > > > Patch 1 is a memory leak fix for AioContext that's needed by Patch 2. >>> > > > >>> > > > Stefan Hajnoczi (2): >>> > > > AioContext: avoid leaking BHs on cleanup >>> > > > AioContext: force event loop iteration using BH >>> > > > >>> > > > async.c | 29 +++++++++++++++++++++++++++-- >>> > > > include/block/aio.h | 3 +++ >>> > > > 2 files changed, 30 insertions(+), 2 deletions(-) >>> > > > >>> > > >>> > > Just gave this a try: The stripped-down guest that hangs during startup >>> > > on master is working fine with these patches applied, and my full setup >>> > > works as well. >>> > > >>> > > So, >>> > > >>> > > Tested-by: Cornelia Huck <cornelia.h...@de.ibm.com> >>> > >>> > Uh-oh, spoke too soon. It starts, but when I try a virsh managedsave, I >>> > get >>> > >>> > qemu-system-s390x: /data/git/yyy/qemu/async.c:242: aio_ctx_finalize: >>> > Assertion `ctx->first_bh->deleted' failed. >>> >>> Please pretty-print ctx->first_bh in gdb. In particular, which function >>> is ctx->first_bh->cb pointing to? >> >> (gdb) p/x *(QEMUBH *)ctx->first_bh >> $2 = {ctx = 0x9aab3730, cb = 0x801b7c5c, opaque = 0x3ff9800dee0, next = >> 0x3ff9800dfb0, scheduled = 0x0, idle = 0x0, deleted = 0x0} >> >> cb is pointing at spawn_thread_bh_fn. >> >>> >>> I tried reproducing with qemu-system-x86_64 and a RHEL 7 guest but >>> couldn't trigger the assertion failure. >> >> I use the old x-data-plane attribute; if I turn it off, I don't hit the >> assertion. > > Thanks. I understand how to reproduce it now: use -drive aio=threads > and do I/O during managedsave. > > I suspect there are more cases of this. We need to clean it up during QEMU > 2.5. > > For now let's continue leaking these BHs as we've always done.
Actually, this case can be fixed in the patch by moving thread_pool_free() before the BH cleanup loop. But I still fear other parts of QEMU may be leaking BHs and we should use a full release cycle to weed them out before introducing the assertion. Stefan