On Sun, Mar 18, 2012 at 2:40 PM, Chris Webb <ch...@arachsys.com> wrote: > Whilst you have patches in progress for the queue draining issue with the IO > throttling code which triggers the assert()s in the ide driver, I thought I > should report a second bug I've seen. I'm not sure whether it's related, but > none of the patch series posted so far appear to fix or affect it. > > I find that if I start a guest booting linux using extlinux and set a > bytes-per-second throttle value less than about 4MB/s, qemu tends to lock up > completely while the bootloader is loading the kernel. For example, there's > a tiny 10MB ext4 filesystem gzipped up at > > http://cdw.me.uk/tmp/test.img.gz > > which just contains extlinux and a kernel. If you run a VM with qemu HEAD as > > qemu -m 1024 -vnc :1 -drive > if=none,id=ide.0.0,format=raw,cache=none,file=test.img,bps=10000000 -device > ide-drive,bus=ide.0,unit=0,bootindex=1,drive=ide.0.0 -monitor stdio > > and watch on VNC, you'll see it hangs whilst loading the kernel. Once this > has happened, no further interaction with the monitor is possible, and the > VNC socket becomes completely unresponsive. This happens about half of the > time with bps set as high as 2*1024*1024. > > I first saw this with the version of the block throttling patches I'd > back-ported on top of qemu-kvm 1.0, but have checked that the problem is > still present in HEAD as of this afternoon [361dea401f52].
Thanks for reporting this. Zhi Yong is travelling so he may not be able to access email for a few days. I downloaded your image and reproduced the issue on qemu.git/master 5bd33de6 ("tcg: fix sparc host for AREG0 free operation"). I set bps to 1 MB per second, which is low but valid. VNC and the QEMU monitor froze. I attached with gdb: $ gdb -p 3705 x86_64-softmmu/qemu-system-x86_64 (gdb) thread apply all bt Thread 2 (Thread 0x7f433dea9700 (LWP 3706)): #0 0x00007f434745a690 in qemu_aio_wait () at aio.c:166 #1 0x00007f434746d2bd in bdrv_rw_co (bs=<optimized out>, sector_num=<optimized out>, buf=<optimized out>, nb_sectors=<optimized out>, is_write=<optimized out>) at block.c:1473 #2 0x00007f43474ed86e in ide_sector_read (s=0x7f43488d6a58) at /home/stefanha/qemu/hw/ide/core.c:480 #3 0x00007f43474ecbf7 in ide_data_readw (opaque=<optimized out>, addr=<optimized out>) at /home/stefanha/qemu/hw/ide/core.c:1692 #4 0x00007f43475d7d3b in memory_region_iorange_read (iorange=0x7f434890bd70, offset=496, width=2, data=0x7f433dea8c50) at /home/stefanha/qemu/memory.c:396 #5 0x00007f43475c84b7 in ioport_readw_thunk (opaque=<optimized out>, addr=<optimized out>) at /home/stefanha/qemu/ioport.c:195 #6 0x00007f43475c8d82 in ioport_read (address=<optimized out>, index=1) at /home/stefanha/qemu/ioport.c:70 #7 cpu_inw (addr=<optimized out>) at /home/stefanha/qemu/ioport.c:318 #8 0x00007f43475cbc21 in kvm_handle_io (count=256, size=2, direction=0, data=<optimized out>, port=496) at /home/stefanha/qemu/kvm-all.c:1117 #9 kvm_cpu_exec (env=0x7f4348870240) at /home/stefanha/qemu/kvm-all.c:1274 #10 0x00007f43475a7171 in qemu_kvm_cpu_thread_fn (arg=0x7f4348870240) at /home/stefanha/qemu/cpus.c:733 #11 0x00007f43458dbb50 in start_thread (arg=<optimized out>) at pthread_create.c:304 #12 0x00007f4343a0990d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #13 0x0000000000000000 in ?? () Thread 1 (Thread 0x7f43473a38c0 (LWP 3705)): #0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136 #1 0x00007f43458de339 in _L_lock_926 () from /lib/x86_64-linux-gnu/libpthread.so.0 #2 0x00007f43458de15b in __pthread_mutex_lock (mutex=0x7f43482532c0) at pthread_mutex_lock.c:61 #3 0x00007f4347555409 in qemu_mutex_lock (mutex=<optimized out>) at qemu-thread-posix.c:54 #4 0x00007f434752b96c in main_loop_wait (nonblocking=<optimized out>) at main-loop.c:460 #5 0x00007f4347454417 in main_loop () at /home/stefanha/qemu/vl.c:1552 #6 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /home/stefanha/qemu/vl.c:3628 What this tells me is: 1. The vcpu thread is blocked in qemu_aio_wait() - it's waiting for I/O request(s) to complete. 2. The iothread is trying to acquire the global mutex but is blocked because the vcpu thread has it. Therefore the monitor and VNC do not work. There is a throttled I/O request in a queue and a timer has been set to wake up and issue the request. The vcpu thread is in qemu_aio_wait(), which does not invoke timer callbacks, so we have deadlocked. This is kind of a fundamental problem because timers use the iothread event loop but we're in a synchronous context - we're in the vcpu thread and the iothread will not be able to execute. In this specific case it would be nice to convert hw/ide/* to use bdrv_aio_*() instead of synchronous block I/O functions. In the general case we may need to build a warning or something into qemu to catch this situation when it occurs. Stefan