On 04/02/2016 00:34, Jim Minter wrote: > I was worried there was > some way in which the contention could cause an abort and perhaps thence > the lockup (which does not seem to recover when the host load goes down).
I don't know... It's not the most tested code, but it is not very complicated either. The certain points that can be extracted from the kernel messages are: 1) there was a cancellation request that took a long time, >20 seconds; 2) despite taking a long time, it _did_ recover sooner or later because otherwise you'd not have the lockup splat either. Paolo >> Firing the NMI watchdog is fixed in more recent QEMU, which has >> asynchronous cancellation, assuming you're running RHEL's QEMU 1.5.3 >> (try /usr/libexec/qemu-kvm --version, or rpm -qf /usr/libexec/qemu-kvm). > > /usr/libexec/qemu-kvm --version reports QEMU emulator version 1.5.3 > (qemu-kvm-1.5.3-105.el7_2.3)