I also suspect something going sideways in the PV spinlock code, but
nothing has changed in the underlying hardware or hypervisor in this
area. There have been bugs in the PV spinlock code in the past,
including using mb() instead of barrier() in the unlock path, which
could cause the VCPU holding a lock to trigger a kick on the VCPU
waiting before the memory write is complete. I looked at the 10.04
kernel, and this particular bug is already addressed in the PV spinlock
code.

These instances are under load when they hang. Here's the uptime and
/proc/interrupts output from one instance before it hung, but after it
was operational:

Linux ip-10-94-81-231 2.6.32-341-ec2 #42-Ubuntu SMP Tue Dec 6 14:56:13 UTC 2011 
x86_64 GNU/Linux
16:10:54 up 16 days, 19:52,  0 users,  load average: 9.86, 5.01, 3.41"
           CPU0       CPU1       CPU2       CPU3       
 16:  186872780  170473347  170447163  170493692   Dynamic-percpu    timer
 17:  191775644  350788322  357828130  357481319   Dynamic-percpu    resched
 18:      67019      74008      66602      66485   Dynamic-percpu    callfunc
 19:     189590     193987     188670     181119   Dynamic-percpu    call1func
 20:          0          0          0          0   Dynamic-percpu    reboot
 21:  165290618  177938588  177538577  177157514   Dynamic-percpu    spinlock
 22:        410          0          0          0   Dynamic-level     xenbus
 23:          0          0          0          0   Dynamic-level     suspend
 24:        341          0         74        180   Dynamic-level     xencons
 25:     392339     664199     899350     700455   Dynamic-level     blkif
 26:   19953668   46164431   58214738   57029478   Dynamic-level     blkif
 27: 1483445834          0          0          0   Dynamic-level     eth0
NMI:          0          0          0          0   Non-maskable interrupts
RES:  191775644  350788323  357828131  357481320   Rescheduling interrupts
CAL:     256609     267995     255272     247604   Function call interrupts

Over the weekend, m2.4xlarge instances hung as well. I'll work on
getting dmesg output.


** Summary changed:

- Kernel deadlock in scheduler on m2.2xlarge EC2 instance
+ Kernel deadlock in scheduler on m2.{2,4}xlarge EC2 instance

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/929941

Title:
  Kernel deadlock in scheduler on m2.{2,4}xlarge EC2 instance

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-ec2/+bug/929941/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to