On 2014/7/2 20:19, Paolo Bonzini wrote: > Il 02/07/2014 13:57, ChenLiang ha scritto: >>>> Hmm, dbs->in_cancel will be true always. Although this will avoid freeing >>>> dbs by dma_comlete. >>>> But it maybe a mistake. >>> >>> This was on purpose; I'm doing the free myself in dma_aio_cancel, so I >>> wanted to avoid the qemu_aio_release from dma_complete. This was in case >>> of a recursive call to dma_complete. But I don't see how that recursive >>> call could happen outside the "if (dbs->acb)"; and inside the "if" the >>> protection is there already. >>> >>> Can you gather the backtraces for _both_ calls to qemu_aio_release, rather >>> than just the second? >> >> (gdb) bt >> #0 qemu_aio_release (p=0x7f44788d1290) at block.c:4260 >> #1 0x00007f4477494e5e in dma_complete (dbs=0x7f44788d1290, ret=0) at >> dma-helpers.c:135 >> #2 0x00007f44774952c2 in dma_aio_cancel (acb=0x7f44788d1290) at >> dma-helpers.c:195 >> #3 0x00007f447744825b in bdrv_aio_cancel (acb=0x7f44788d1290) at >> block.c:3848 >> #4 0x00007f4477513911 in ide_bus_reset (bus=0x7f44785f1bd8) at >> hw/ide/core.c:1957 >> #5 0x00007f4477516b3c in piix3_reset (opaque=0x7f44785f1530) at >> hw/ide/piix.c:113 >> #6 0x00007f4477647b9f in qemu_devices_reset () at vl.c:2131 >> #7 0x00007f4477647c0f in qemu_system_reset (report=true) at vl.c:2140 >> #8 0x00007f4477648127 in main_loop_should_exit () at vl.c:2274 >> #9 0x00007f447764823a in main_loop () at vl.c:2323 >> #10 0x00007f447764f6da in main (argc=57, argv=0x7fff5d194378, >> envp=0x7fff5d194548) at vl.c:4803 > > And the second is > > #7 0x00007f3cb525de5e in dma_complete (dbs=0x7f3cb63f3220, ret=0) at > dma-helpers.c:135 > #8 0x00007f3cb525df3d in dma_bdrv_cb (opaque=0x7f3cb63f3220, ret=0) at > dma-helpers.c:152 > #9 0x00007f3cb5212102 in bdrv_co_em_bh (opaque=0x7f3cb6398980) at > block.c:4127 > #10 0x00007f3cb51f6cef in aio_bh_poll (ctx=0x7f3cb622a8f0) at async.c:70 > #11 0x00007f3cb51f695a in aio_poll (ctx=0x7f3cb622a8f0, blocking=false) at > aio-posix.c:185 > #12 0x00007f3cb51f7056 in aio_ctx_dispatch (source=0x7f3cb622a8f0, > callback=0x0, user_data=0x0) > at async.c:167 > #13 0x00007f3cb48b969a in g_main_context_dispatch () from > /usr/lib64/libglib-2.0.so.0 > > This explains why my patch "fixes" the bug. It turns a double free > into a dangling pointer: the second call now sees in_cancel == true > and skips the free. > > The second call should have happened within dma_aio_cancel's call to > bdrv_aio_cancel. This is the real bug. > > What is your version of QEMU? I cannot see any where bdrv_co_em_bh is > at line 4127 or bdrv_aio_cancel is at line 3848. Can you reproduce it > with qemu.git master? > > Paolo > > . >
qemu master branch bt: Program received signal SIGABRT, Aborted. 0x00007fd548355b55 in raise () from /lib64/libc.so.6 (gdb) bt #0 0x00007fd548355b55 in raise () from /lib64/libc.so.6 #1 0x00007fd548357131 in abort () from /lib64/libc.so.6 #2 0x00007fd548393e0f in __libc_message () from /lib64/libc.so.6 #3 0x00007fd548399618 in malloc_printerr () from /lib64/libc.so.6 #4 0x00007fd54b15e80e in free_and_trace (mem=0x7fd54beb2230) at vl.c:2815 #5 0x00007fd54b3453cd in qemu_aio_release (p=0x7fd54beb2230) at block.c:4813 #6 0x00007fd54b15717d in dma_complete (dbs=0x7fd54beb2230, ret=0) at dma-helpers.c:132 #7 0x00007fd54b157253 in dma_bdrv_cb (opaque=0x7fd54beb2230, ret=0) at dma-helpers.c:148 #8 0x00007fd54b344db8 in bdrv_co_em_bh (opaque=0x7fd54bea4b30) at block.c:4676 #9 0x00007fd54b335a72 in aio_bh_poll (ctx=0x7fd54bcec990) at async.c:81 #10 0x00007fd54b34b1b4 in aio_poll (ctx=0x7fd54bcec990, blocking=false) at aio-posix.c:188 #11 0x00007fd54b335ee0 in aio_ctx_dispatch (source=0x7fd54bcec990, callback=0x0, user_data=0x0) at async.c:211 #12 0x00007fd549e3669a in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0 #13 0x00007fd54b348c45 in glib_pollfds_poll () at main-loop.c:190 #14 0x00007fd54b348d3d in os_host_main_loop_wait (timeout=0) at main-loop.c:235 #15 0x00007fd54b348e2f in main_loop_wait (nonblocking=0) at main-loop.c:484 #16 0x00007fd54b15b0f8 in main_loop () at vl.c:2007 #17 0x00007fd54b162a35 in main (argc=57, argv=0x7fff152720a8, envp=0x7fff15272278) at vl.c:4526 (gdb) bt #0 qemu_aio_release (p=0x7f86420ebec0) at block.c:4811 #1 0x00007f86412b617d in dma_complete (dbs=0x7f86420ebec0, ret=0) at dma-helpers.c:132 #2 0x00007f86412b65ab in dma_aio_cancel (acb=0x7f86420ebec0) at dma-helpers.c:192 #3 0x00007f86414a3996 in bdrv_aio_cancel (acb=0x7f86420ebec0) at block.c:4559 #4 0x00007f86413906af in ide_bus_reset (bus=0x7f8641fe3a20) at hw/ide/core.c:2056 #5 0x00007f86413967d6 in piix3_reset (opaque=0x7f8641fe32a0) at hw/ide/piix.c:114 #6 0x00007f86412b9a37 in qemu_devices_reset () at vl.c:1829 #7 0x00007f86412b9aef in qemu_system_reset (report=true) at vl.c:1842 #8 0x00007f86412b9fe2 in main_loop_should_exit () at vl.c:1971 #9 0x00007f86412ba100 in main_loop () at vl.c:2011 #10 0x00007f86412c1a35 in main (argc=57, argv=0x7fff2e827d38, envp=0x7fff2e827f08) at vl.c:4526 BTW, is it better to rename dbs->in_cancel to dbs->canceled ? Best regards Chenliang