On Mon, Jan 2, 2012 at 3:39 PM, Christoph Hellwig <h...@lst.de> wrote: > On Fri, Dec 30, 2011 at 10:35:01AM +0000, Stefan Hajnoczi wrote: >> If you can reproduce this bug and suspect coroutines are involved then I > > It's entirely reproducable. > > I've played around a bit and switched from the ucontext to the gthreads > coroutine implementation. The result seems odd, but starts to make sense. > > Running the workload I now get the following message from qemu: > > Co-routine re-entered recursively > > and the gdb backtrace looks like: > > (gdb) bt > #0 0x00007f2fff36f405 in *__GI_raise (sig=<optimized out>) > at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 > #1 0x00007f2fff372680 in *__GI_abort () at abort.c:92 > #2 0x00007f30019a6616 in qemu_coroutine_enter (co=0x7f3004d4d7b0, opaque=0x0) > at qemu-coroutine.c:53 > #3 0x00007f30019a5e82 in qemu_co_queue_next_bh (opaque=<optimized out>) > at qemu-coroutine-lock.c:43 > #4 0x00007f30018d5a72 in qemu_bh_poll () at async.c:71 > #5 0x00007f3001982990 in main_loop_wait (nonblocking=<optimized out>) > at main-loop.c:472 > #6 0x00007f30018cf714 in main_loop () at /home/hch/work/qemu/vl.c:1481 > #7 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) > at /home/hch/work/qemu/vl.c:3479 > > adding some printks suggest this happens when calling add_aio_request from > aio_read_response when either delaying creates, or updating metadata, > although not everytime one of these cases happens. > > I've tried to understand how the recursive calling happens, but unfortunately > the whole coroutine code lacks any sort of documentation how it should > behave or what it asserts about the callers. > >> I don't have a sheepdog setup here but if there's an easy way to >> reproduce please let me know and I'll take a look. > > With the small patch below applied to the sheppdog source I can reproduce > the issue on my laptop using the following setup: > > for port in 7000 7001 7002; do > mkdir -p /mnt/sheepdog/$port > /usr/sbin/sheep -p $port -c local /mnt/sheepdog/$port > sleep 2 > done > > collie cluster format > collie vdi create test 20G > > then start a qemu instance that uses the the sheepdog volume using the > following device and drive lines: > > -drive if=none,file=sheepdog:test,cache=none,id=test \ > -device virtio-blk-pci,drive=test,id=testdev \ > > finally, in the guest run: > > dd if=/dev/zero of=/dev/vdX bs=67108864 count=128 oflag=direct
Thanks for these instructions. I can reproduce the issue here. It seems suspicious the way that BDRVSheepdogState->co_recv and ->co_send work. The code adds select(2) read/write callback functions on the sheepdog socket file descriptor. When the socket becomes writeable or readable the co_send or co_recv coroutines are entered. So far, so good, this is how a coroutine is integrated into the main loop of QEMU. The problem is that this patch is mixing things. The co_recv coroutine runs aio_read_response(), which invokes send_pending_req(). send_pending_req() invokes add_aio_request(). That function isn't suitable for co_recv's context because it actually sends data and hits a few blocking (yield) points. It takes a coroutine mutex - but the select(2) read callback is still in place. We're now still in the aio_read_response() call chain except we're actually not reading at all, we're trying to write! And we'll get spurious wakeups if there is any data readable on the socket. So the co_recv coroutine has two things in the system that will try to enter it: 1. The select(2) read callback on the sheepdog socket. 2. The aio_add_request() blocking operations, including a coroutine mutex. This is bad, a yielded coroutine should only have one thing that will enter it. It's rare that it makes sense to have multiple things entering a coroutine. It's late here but I hope this gives Kazutaka some thoughts on what is causing the issue with this patch. Stefan