On Tue, 2 Aug 2016 15:35:34 +0200 Vegard Nossum <vegard.nos...@oracle.com> wrote:
> On 08/02/2016 11:13 AM, Vegard Nossum wrote: > > On 08/02/2016 11:03 AM, Cornelia Huck wrote: > >> On Sat, 30 Jul 2016 23:42:18 +0200 > >> Vegard Nossum <vegard.nos...@oracle.com> wrote: > >> > >>> Hi, > >>> > >>> With fault injection triggering an allocation failure for the > >>> alloc_indirect() call in virtqueue_add() I'm seeing a hang in > >>> p9_virtio_zc_request() -- it seems to be waiting here indefinitely > >>> (i.e. at least 120 seconds): > >>> > > [...] > > > >> What happens is that the code falls back to direct virtio addressing > >> (after indirect addressing failed) - and this should work. > >> > >> I'm more inclined to suspect a qemu instead of a kernel bug, as your > >> qemu version is quite old and there have been fixes in the virtio > >> buffer handling and virtio-9p in the meantime. (I'm suspecting > >> "virtio-9p: fix any_layout".) > >> > >> Could you retry with a more recent qemu (at least version 2.4)? > > > > I think maybe the version number in the stack trace is a bit misleading, > > this is the full/actual version: > > > > $ kvm --version > > QEMU emulator version 2.5.0 (Debian 1:2.5+dfsg-5ubuntu10.1), Copyright > > (c) 2003-2008 Fabrice Bellard > > > > I'll still try to get qemu from git and see if it makes a difference. > > Thanks, > > I still seem to get it: > > $ qemu-system-x86_64 --version > QEMU emulator version 2.6.91 (v2.7.0-rc1-2-gcc0100f-dirty), Copyright > (c) 2003-2008 Fabrice Bellard :( Sorry, no good immediate idea. One thing would be to check whether you get notified by qemu after the request was queued (i.e., whether vring_interrupt() ever gets called with 9p's req_done() after the alloc failure was injected). This would help to suggest whether to continue debugging here or in qemu. I still think the root of this error is some failure of the virtio 9p code to deal with non-indirect buffers, either in the driver or in qemu.