Am 07.09.2018 um 18:21 hat Marc Hartmayer geschrieben: > On Fri, Sep 07, 2018 at 11:34 AM +0200, Kevin Wolf <[email protected]> wrote: > > Am 06.09.2018 um 21:29 hat Christian Borntraeger geschrieben: > >> Kevin, > >> > >> for reference, it seems that his bug report somehow got lost. > >> https://bugs.launchpad.net/qemu/+bug/1788582 > > > > That looks... interesting. The reproducer doesn't even seem to use a > > block device, and the backtrace shows a QEMU that is just sitting in the > > main loop waiting for events, not somewhere in the shutdown process > > after exiting the main loop where bdrv_drain_all() would be called. I > > fail to even come up with a theory about any connection between this and > > commit 0f12264e7. > > > > I think we need more information there. Can you set a breakpoint on > > bdrv_drain_all_begin() to see where it's even called? When I start a > > qemu instance without a block device, the first time this is called is > > during shutdown after the mainloop (i.e. after the place where you're > > seeing a hang). > > I can try that. > > > > > Maybe bisect within the commit that seems to cause the bug, by > > selectively disabling some hunks in it? > > If I remove the line(s) > > /* Execute pending BHs first (may modify the graph) and check everything > * else only after the BHs have executed. */ > while (aio_poll(qemu_get_aio_context(), false)); > > in the function 'bdrv_drain_all_poll', then it works.
It still doesn't make sense to me why this would make any difference without a block device (and without iothreads), but you could give this patch series of mine a try: [PATCH 00/14] Fix some jobs/drain/aio_poll related hangs Amongst others, it does remove the line you quoted. Kevin
