On Fri, Aug 02, 2013 at 09:58:29AM -0000, Oliver Francke wrote: > after some testing I tried to narrow down a problem, which was initially > reported by some users. > Seen on different distros - debian 7.1, ubuntu 12.04 LTS, IPFire-2.3 as > reported by now. > > All using some flavour of linux-3.2.x kernel. > > Tried e.g. under Ubuntu an upgrade to "Linux 3.8.0-27-generic x86_64" which > solves the problem.
Is that a guest kernel upgrade? > Problem could be triggert with some workload ala: > > spew -v --raw -P -t -i 3 -b 4k -p random -B 4k 1G /tmp/doof.dat > and in parallel do some apt-get install/remove/whatever. > > That results in a somewhat stuck qemu-session with the bad > "kernel_hung_task..." messages. > > A typical command-line is as follows: > > /usr/local/qemu-1.6.0/bin/qemu-system-x86_64 -usbdevice tablet -enable- > kvm -daemonize -pidfile /var/run/qemu-server/760.pid -monitor > unix:/var/run/qemu-server/760.mon,server,nowait -vnc unix:/var/run/qemu- > server/760.vnc,password -qmp unix:/var/run/qemu- > server/760.qmp,server,nowait -nodefaults -serial none -parallel none > -device virtio-net-pci,mac=00:F1:70:00:2F:80,netdev=vlan0d0 -netdev > type=tap,id=vlan0d0,ifname=tap760i0d0,script=/etc/fcms/add_if.sh,downscript=/etc/fcms/downscript.sh > -name 1155823384-4 -m 512 -vga cirrus -k de -smp sockets=1,cores=1 > -device virtio-blk-pci,drive=virtio0 -drive > format=raw,file=rbd:1155823384/vm-760-disk-1.rbd:rbd_cache=false,cache=writeback,if=none,id=virtio0,media=disk,index=0,aio=native > -drive > format=raw,file=rbd:1155823384/vm-760-swap-1.rbd:rbd_cache=false,cache=writeback,if=virtio,media=disk,index=1,aio=native > -drive if=ide,media=cdrom,id=ide1-cd0,readonly=on -drive > if=ide,media=cdrom,id=ide1-cd1,readonly=on -boot order=dc > > no "system_reset", "sendkey ctrl-alt-delete" or "q" in monitoring- > session is accepted, need to hard-kill the process. Yesterday I saw a possibly related report on IRC. It was a Windows guest running under OpenStack with images on Ceph. They reported that the QEMU process would lock up - ping would not work and their management tools showed 0 CPU activity for the guest. However, they were able to "kick" the guest by taking a VNC screenshot (I think). Then it would come back to life. If you have a Linux guest that is reporting kernel_hung_task, then it could be a similar scenario. Please confirm that the hung task message is from inside the guest. If you are able to reproduce this and have an alternative non-Ceph storage pool, please try that since Ceph is common to both these bug reports. Stefan