Am 10.06.2015 um 11:34 schrieb Fam Zheng: > On Wed, 06/10 11:18, Christian Borntraeger wrote: >> Am 10.06.2015 um 04:12 schrieb Fam Zheng: >>> On Tue, 06/09 11:01, Christian Borntraeger wrote: >>>> Am 09.06.2015 um 04:28 schrieb Fam Zheng: >>>>> On Tue, 06/02 16:36, Christian Borntraeger wrote: >>>>>> Paolo, >>>>>> >>>>>> I bisected >>>>>> commit a0710f7995f914e3044e5899bd8ff6c43c62f916 >>>>>> Author: Paolo Bonzini <pbonz...@redhat.com> >>>>>> AuthorDate: Fri Feb 20 17:26:52 2015 +0100 >>>>>> Commit: Kevin Wolf <kw...@redhat.com> >>>>>> CommitDate: Tue Apr 28 15:36:08 2015 +0200 >>>>>> >>>>>> iothread: release iothread around aio_poll >>>>>> >>>>>> to cause a problem with hanging guests. >>>>>> >>>>>> Having many guests all with a kernel/ramdisk (via -kernel) and >>>>>> several null block devices will result in hangs. All hanging >>>>>> guests are in partition detection code waiting for an I/O to return >>>>>> so very early maybe even the first I/O. >>>>>> >>>>>> Reverting that commit "fixes" the hangs. >>>>>> Any ideas? >>>>> >>>>> Christian, I can't reproduce this on my x86 box with virtio-blk-pci. Do >>>>> you >>>>> have a reproducer for x86? Or could you collect backtraces for all the >>>>> threads >>>>> in QEMU when it hangs? >>>>> >>>>> My long shot is that the main loop is blocked at aio_context_acquire(ctx), >>>>> while the iothread of that ctx is blocked at aio_poll(ctx, blocking). >>>> >>>> Here is a backtrace on s390. I need 2 or more disks, (one is not enough). >>> >>> It shows iothreads and main loop are all waiting for events, and the vcpu >>> threads are running guest code. >>> >>> It could be the requests being leaked. Do you see this problem with a >>> regular >>> file based image or null-co driver? Maybe we're missing something about the >>> AioContext in block/null.c. >> >> It seems to run with normal file based images. As soon as I have two or more >> null-aio >> devices it hangs pretty soon when doing a reboot loop. >> > > Ahh! If it's a reboot loop, the device reset thing may get fishy. I suspect > the > completion BH used by null-aio may be messed up, that's why I wonder whether > null-co:// would work for you. Could you test that?
null-co also fails. > > Also, could you try below patch with null-aio://, too? The same. Guests still get stuck. > > Thanks, > Fam > > --- > > diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c > index cd539aa..c87b444 100644 > --- a/hw/block/virtio-blk.c > +++ b/hw/block/virtio-blk.c > @@ -652,15 +652,11 @@ static void virtio_blk_reset(VirtIODevice *vdev) > { > VirtIOBlock *s = VIRTIO_BLK(vdev); > > - if (s->dataplane) { > - virtio_blk_data_plane_stop(s->dataplane); > - } > - > - /* > - * This should cancel pending requests, but can't do nicely until there > - * are per-device request lists. > - */ > blk_drain_all(); > + if (s->dataplane) { > + virtio_blk_data_plane_stop(s->dataplane); > + } > + > blk_set_enable_write_cache(s->blk, s->original_wce); > } >