On Thu 21 Mar 2019 03:51:12 PM CET, Alberto Garcia <be...@igalia.com> wrote:
> I was checking the tests that run commit and stream in parallel in > 030, but they do commit on the upper images and stream on the lower > ones, so that's safe. I'll try to run them the other way around > because we might have a problem there. I considered these scenarios with the following backing chain: E <- D <- C <- B <- A 1) stream from C to A, then commit from C to E This fails because qmp_block_commit() checks for op blockers in C's overlay (B), which is blocked by the stream block job. ("Node 'B' is busy: block device is in use by block job: stream") 2) commit from C to E, then stream from C to A This fails because the commit job inserts a filter between C and B and the bdrv_freeze_backing_chain(bs, base) call in stream_start() fails. However! I found this crash in a couple of occasions, I believe that it happens if the commit job finishes before block_stream, but I need to debug it further to see why the previous error didn't happen. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x0000559aca6e745d in stream_prepare (job=0x559acdafad70) at block/stream.c:80 80 base_fmt = base->drv->format_name; (gdb) print base $1 = (BlockDriverState *) 0x559acd070240 (gdb) print base->drv $2 = (BlockDriver *) 0xb5b5b5b5b5b5b5b5 (gdb) bt #0 0x0000559aca6e745d in stream_prepare (job=0x559acdafad70) at block/stream.c:80 #1 0x0000559aca973a40 in job_prepare (job=0x559acdafad70) at job.c:771 #2 0x0000559aca9722fd in job_txn_apply (txn=0x559acd01e6d0, fn=0x559aca973a03 <job_prepare>) at job.c:146 #3 0x0000559aca973ad2 in job_do_finalize (job=0x559acdafad70) at job.c:788 #4 0x0000559aca973ca0 in job_completed_txn_success (job=0x559acdafad70) at job.c:842 #5 0x0000559aca973d3d in job_completed (job=0x559acdafad70) at job.c:855 #6 0x0000559aca973d8c in job_exit (opaque=0x559acdafad70) at job.c:874 #7 0x0000559acaa99c55 in aio_bh_call (bh=0x559acd3247f0) at util/async.c:90 #8 0x0000559acaa99ced in aio_bh_poll (ctx=0x559accfb9a30) at util/async.c:118 #9 0x0000559acaa9ebc0 in aio_dispatch (ctx=0x559accfb9a30) at util/aio-posix.c:460 #10 0x0000559acaa9a088 in aio_ctx_dispatch (source=0x559accfb9a30, callback=0x0, user_data=0x0) at util/async.c:261 #11 0x00007f7d8e7787f7 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0 #12 0x0000559acaa9d4bf in glib_pollfds_poll () at util/main-loop.c:222 #13 0x0000559acaa9d539 in os_host_main_loop_wait (timeout=0) at util/main-loop.c:245 #14 0x0000559acaa9d63e in main_loop_wait (nonblocking=0) at util/main-loop.c:521 #15 0x0000559aca6c0ace in main_loop () at vl.c:1969 #16 0x0000559aca6c7db3 in main (argc=18, argv=0x7ffe11ee6d58, envp=0x7ffe11ee6df0) at vl.c:4589 So we need to look into this :( but I'd say that it seems that stream should not need 'base' at all, just the node on top of it. Berto