Public bug reported: I have been using qemu on daily basis 5+ years in order to do btrfs development and testing and it always worked perfectly, until I upgraded the linux kernel of the guests to 4.16. With 4.16+ kernels, when running all the fstests (previously called xfstests), the qemu process hangs (console unresponsive, can't ping or ssh the guest anymore, etc) and stays in a state Sl+ according to 'ps'.
This happens on two different physical machines, one running openSUSE tumbleweed (which I don't access at the moment to check kernel version) and another running xubuntu (tried kernels 4.15.0-32-generic and vanilla 4.18.0). Using any kernel from 4.16 to 4.19-rc5 in the guests (they use different debian versions) makes qemu hang running the fstests suite (after about 30 to 40 minutes, either at test generic/299 or test generic/451). I tried different qemu versions, 2.11.2, 2.12.1 and 3.0.0, and it happens with all of them (all built from the sources available at https://www.qemu.org/download/#source). I built 3.0.0 with debug enabled, using the following parameters for 'configure': --prefix=/home/fdmanana/qemu-3.0.0 --enable-tools --enable-linux-aio --enable-kvm --enable-vnc --enable-vnc-png --enable-debug --extra- cflags="-O0 -g3 -fno-omit-frame-pointer" And captured a coredump of the qemu process, available at: https://www.dropbox.com/s/d1tlsimahykwhla/core_dump_debug.tar.xz?dl=0 the stack traces of all threads, for a quick look: https://friendpaste.com/zqkz2pD0WgSdeSKITHPDf qemu is being invoked with the following script: #!/bin/bash -x sudo modprobe tun sudo modprobe kvm sudo modprobe kvm-intel sudo tunctl -t tap5 -u fdmanana sudo ifconfig tap5 up sudo brctl addif br0 tap5 sudo umount /mnt/tmp5 sudo mkdir -p /mnt/tmp5 sudo mount -t tmpfs -o size=14G tmpfs /mnt/tmp5 for ((i = 2; i <= 7; i++)); do sudo qemu-img create -f qcow2 /mnt/tmp5/disk$i 13G done sudo chown fdmanana /mnt/tmp5/disk* qemu-system-x86_64 -m 4G \ -device virtio-scsi-pci \ -boot c \ \ -drive if=none,file=debian5.qcow2,cache=none,aio=native,cache.direct=on,format=qcow2,id=drive0,discard=on \ -device scsi-hd,drive=drive0,bus=scsi.0 \ \ -drive if=none,file=/mnt/tmp5/disk2,cache=writeback,format=qcow2,id=drive1,discard=on \ -device scsi-hd,drive=drive1,bus=scsi.0 \ \ -drive if=none,file=/mnt/tmp5/disk3,cache=writeback,format=qcow2,id=drive2,discard=on \ -device scsi-hd,drive=drive2,bus=scsi.0 \ \ -drive if=none,file=/mnt/tmp5/disk4,cache=writeback,format=qcow2,id=drive3,discard=on \ -device scsi-hd,drive=drive3,bus=scsi.0 \ \ -drive if=none,file=/mnt/tmp5/disk5,cache=writeback,format=qcow2,id=drive4,discard=on \ -device scsi-hd,drive=drive4,bus=scsi.0 \ \ -drive if=none,file=/mnt/tmp5/disk6,cache=writeback,format=qcow2,id=drive5,discard=on \ -device scsi-hd,drive=drive5,bus=scsi.0 \ \ -drive if=none,file=/mnt/tmp5/disk7,cache=writeback,format=qcow2,id=drive6,discard=on \ -device scsi-hd,drive=drive6,bus=scsi.0 \ \ -drive if=none,file=disk8,cache=writeback,aio=native,cache.direct=on,format=qcow2,id=drive7,discard=on \ -device scsi-hd,drive=drive7,bus=scsi.0 \ \ -drive if=none,file=disk9,cache=writeback,aio=native,cache.direct=on,format=qcow2,id=drive8,discard=on \ -device scsi-hd,drive=drive8,bus=scsi.0 \ \ -drive if=none,file=disk10,cache=writeback,aio=native,cache.direct=on,format=qcow2,id=drive9,discard=on \ -device scsi-hd,drive=drive9,bus=scsi.0 \ \ -net nic,macaddr=52:54:00:12:34:fa -net tap,ifname=tap5,script=no,downscript=no \ -rtc base=localtime -enable-kvm -machine accel=kvm -smp 4 -cpu host \ -k pt -serial tcp:127.0.0.1:9997 -display vnc=:5 Is there anything else I can provided to help debug this? Thanks. ** Affects: qemu Importance: Undecided Status: New -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1794950 Title: qemu hangs when guest is using linux kernel 4.16+ Status in QEMU: New Bug description: I have been using qemu on daily basis 5+ years in order to do btrfs development and testing and it always worked perfectly, until I upgraded the linux kernel of the guests to 4.16. With 4.16+ kernels, when running all the fstests (previously called xfstests), the qemu process hangs (console unresponsive, can't ping or ssh the guest anymore, etc) and stays in a state Sl+ according to 'ps'. This happens on two different physical machines, one running openSUSE tumbleweed (which I don't access at the moment to check kernel version) and another running xubuntu (tried kernels 4.15.0-32-generic and vanilla 4.18.0). Using any kernel from 4.16 to 4.19-rc5 in the guests (they use different debian versions) makes qemu hang running the fstests suite (after about 30 to 40 minutes, either at test generic/299 or test generic/451). I tried different qemu versions, 2.11.2, 2.12.1 and 3.0.0, and it happens with all of them (all built from the sources available at https://www.qemu.org/download/#source). I built 3.0.0 with debug enabled, using the following parameters for 'configure': --prefix=/home/fdmanana/qemu-3.0.0 --enable-tools --enable-linux-aio --enable-kvm --enable-vnc --enable-vnc-png --enable-debug --extra- cflags="-O0 -g3 -fno-omit-frame-pointer" And captured a coredump of the qemu process, available at: https://www.dropbox.com/s/d1tlsimahykwhla/core_dump_debug.tar.xz?dl=0 the stack traces of all threads, for a quick look: https://friendpaste.com/zqkz2pD0WgSdeSKITHPDf qemu is being invoked with the following script: #!/bin/bash -x sudo modprobe tun sudo modprobe kvm sudo modprobe kvm-intel sudo tunctl -t tap5 -u fdmanana sudo ifconfig tap5 up sudo brctl addif br0 tap5 sudo umount /mnt/tmp5 sudo mkdir -p /mnt/tmp5 sudo mount -t tmpfs -o size=14G tmpfs /mnt/tmp5 for ((i = 2; i <= 7; i++)); do sudo qemu-img create -f qcow2 /mnt/tmp5/disk$i 13G done sudo chown fdmanana /mnt/tmp5/disk* qemu-system-x86_64 -m 4G \ -device virtio-scsi-pci \ -boot c \ \ -drive if=none,file=debian5.qcow2,cache=none,aio=native,cache.direct=on,format=qcow2,id=drive0,discard=on \ -device scsi-hd,drive=drive0,bus=scsi.0 \ \ -drive if=none,file=/mnt/tmp5/disk2,cache=writeback,format=qcow2,id=drive1,discard=on \ -device scsi-hd,drive=drive1,bus=scsi.0 \ \ -drive if=none,file=/mnt/tmp5/disk3,cache=writeback,format=qcow2,id=drive2,discard=on \ -device scsi-hd,drive=drive2,bus=scsi.0 \ \ -drive if=none,file=/mnt/tmp5/disk4,cache=writeback,format=qcow2,id=drive3,discard=on \ -device scsi-hd,drive=drive3,bus=scsi.0 \ \ -drive if=none,file=/mnt/tmp5/disk5,cache=writeback,format=qcow2,id=drive4,discard=on \ -device scsi-hd,drive=drive4,bus=scsi.0 \ \ -drive if=none,file=/mnt/tmp5/disk6,cache=writeback,format=qcow2,id=drive5,discard=on \ -device scsi-hd,drive=drive5,bus=scsi.0 \ \ -drive if=none,file=/mnt/tmp5/disk7,cache=writeback,format=qcow2,id=drive6,discard=on \ -device scsi-hd,drive=drive6,bus=scsi.0 \ \ -drive if=none,file=disk8,cache=writeback,aio=native,cache.direct=on,format=qcow2,id=drive7,discard=on \ -device scsi-hd,drive=drive7,bus=scsi.0 \ \ -drive if=none,file=disk9,cache=writeback,aio=native,cache.direct=on,format=qcow2,id=drive8,discard=on \ -device scsi-hd,drive=drive8,bus=scsi.0 \ \ -drive if=none,file=disk10,cache=writeback,aio=native,cache.direct=on,format=qcow2,id=drive9,discard=on \ -device scsi-hd,drive=drive9,bus=scsi.0 \ \ -net nic,macaddr=52:54:00:12:34:fa -net tap,ifname=tap5,script=no,downscript=no \ -rtc base=localtime -enable-kvm -machine accel=kvm -smp 4 -cpu host \ -k pt -serial tcp:127.0.0.1:9997 -display vnc=:5 Is there anything else I can provided to help debug this? Thanks. To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1794950/+subscriptions