I also suspect something going sideways in the PV spinlock code, but nothing has changed in the underlying hardware or hypervisor in this area. There have been bugs in the PV spinlock code in the past, including using mb() instead of barrier() in the unlock path, which could cause the VCPU holding a lock to trigger a kick on the VCPU waiting before the memory write is complete. I looked at the 10.04 kernel, and this particular bug is already addressed in the PV spinlock code.
These instances are under load when they hang. Here's the uptime and /proc/interrupts output from one instance before it hung, but after it was operational: Linux ip-10-94-81-231 2.6.32-341-ec2 #42-Ubuntu SMP Tue Dec 6 14:56:13 UTC 2011 x86_64 GNU/Linux 16:10:54 up 16 days, 19:52, 0 users, load average: 9.86, 5.01, 3.41" CPU0 CPU1 CPU2 CPU3 16: 186872780 170473347 170447163 170493692 Dynamic-percpu timer 17: 191775644 350788322 357828130 357481319 Dynamic-percpu resched 18: 67019 74008 66602 66485 Dynamic-percpu callfunc 19: 189590 193987 188670 181119 Dynamic-percpu call1func 20: 0 0 0 0 Dynamic-percpu reboot 21: 165290618 177938588 177538577 177157514 Dynamic-percpu spinlock 22: 410 0 0 0 Dynamic-level xenbus 23: 0 0 0 0 Dynamic-level suspend 24: 341 0 74 180 Dynamic-level xencons 25: 392339 664199 899350 700455 Dynamic-level blkif 26: 19953668 46164431 58214738 57029478 Dynamic-level blkif 27: 1483445834 0 0 0 Dynamic-level eth0 NMI: 0 0 0 0 Non-maskable interrupts RES: 191775644 350788323 357828131 357481320 Rescheduling interrupts CAL: 256609 267995 255272 247604 Function call interrupts Over the weekend, m2.4xlarge instances hung as well. I'll work on getting dmesg output. ** Summary changed: - Kernel deadlock in scheduler on m2.2xlarge EC2 instance + Kernel deadlock in scheduler on m2.{2,4}xlarge EC2 instance -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to the bug report. https://bugs.launchpad.net/bugs/929941 Title: Kernel deadlock in scheduler on m2.{2,4}xlarge EC2 instance To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-ec2/+bug/929941/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs