On 03/27/2017 06:45 PM, Max Reitz wrote: > @Subject: Do you mean "PATCH for-2.9?"? Because "RFC" to me means > "please do not merge". ;-) > > I wouldn't mind a change like this to go into 2.9. I am quite sure that problem is here but unsure about the place. Here I use term 'RFC' as 'Can we start the discussion'?
> On 27.03.2017 17:35, Denis V. Lunev wrote: >> Recently we expirience hang with iothreads enabled with the following >> call trace: >> Thread 1 (Thread 0x7fa95efebc80 (LWP 177117)): >> 0 ppoll () from /lib64/libc.so.6 >> 2 qemu_poll_ns () at qemu-timer.c:313 >> 3 aio_poll () at aio-posix.c:457 >> 4 bdrv_flush () at block/io.c:2641 >> 5 bdrv_close () at block.c:2143 >> 6 bdrv_delete () at block.c:2352 >> 7 bdrv_unref () at block.c:3429 >> 8 blk_remove_bs () at block/block-backend.c:427 >> 9 blk_delete () at block/block-backend.c:178 >> 10 blk_unref () at block/block-backend.c:226 >> 11 object_property_del_all () at qom/object.c:399 >> 12 object_finalize () at qom/object.c:461 >> 13 object_unref () at qom/object.c:898 >> 14 object_property_del_child () at qom/object.c:422 >> 15 qmp_marshal_device_del () at qmp-marshal.c:1145 >> 16 handle_qmp_command () at /usr/src/debug/qemu-2.6.0/monitor.c:3929 >> >> Technically bdrv_flush() stucks in >> while (rwco.ret == NOT_DONE) { >> aio_poll(aio_context, true); >> } >> but rwco.ret is equal to 0 thus we have missed wakeup. Code investigation >> reveals that we do not have performed aio_context_acquire() on this call >> stack. >> >> This patch adds missed lock. >> >> Signed-off-by: Denis V. Lunev <d...@openvz.org> >> CC: Kevin Wolf <kw...@redhat.com> >> CC: Max Reitz <mre...@redhat.com> >> CC: Eric Blake <ebl...@redhat.com> >> CC: Markus Armbruster <arm...@redhat.com> >> --- >> block/block-backend.c | 4 ++++ >> 1 file changed, 4 insertions(+) >> >> diff --git a/block/block-backend.c b/block/block-backend.c >> index 5742c09..65d5da9 100644 >> --- a/block/block-backend.c >> +++ b/block/block-backend.c >> @@ -273,7 +273,11 @@ void blk_unref(BlockBackend *blk) >> if (blk) { >> assert(blk->refcnt > 0); >> if (!--blk->refcnt) { >> + AioContext *ctx = blk_get_aio_context(blk); >> + >> + aio_context_acquire(ctx); >> blk_delete(blk); >> + aio_context_release(ctx); > But I don't think this is quite the correct place. The caller of > blk_unref() should have acquired the AioContext already. As far as I can > tell in this case that would be originally release_drive() in > hw/core/qdev-properties-system.c and then blk_detach_dev(). > > I think the former would be the somehow more correct place but I can > imagine the latter to be more useful in reality. I'll leave it to you. > > (As an alternative, you may of course convince me that this patch is > indeed correct and should be taken as-is. :-)) ha-ha ;) no, I not going to try to convince you and that is why the patch was marked as RFC. I like release_drive() place. This should solve the problem for me. Thank you for the quick answer and good proposal. Den