On Tue, 17 Jul 2007 17:49:34 +0200 Ingo Molnar <[EMAIL PROTECTED]> wrote:
> Subject: fix the softlockup watchdog to actually work > From: Ingo Molnar <[EMAIL PROTECTED]> > > this Xen related commit: > > commit 966812dc98e6a7fcdf759cbfa0efab77500a8868 > Author: Jeremy Fitzhardinge <[EMAIL PROTECTED]> > Date: Tue May 8 00:28:02 2007 -0700 > > Ignore stolen time in the softlockup watchdog > > broke the softlockup watchdog to never report any lockups. (!) > > print_timestamp defaults to 0, this makes the following condition > always true: > > if (print_timestamp < (touch_timestamp + 1) || > > and we'll in essence never report soft lockups. > > apparently the functionality of the soft lockup watchdog was never > actually tested with that patch applied ... > > [this is -stable material too.] This seems terribly sensitive. Someone has broken the Vaio (shock, horror). It now has mysterious jerkiness: when leaning on autorepeat it stalls for maybe 0.25 seconds every 1.5 seconds. The stalls are far less than a second. Yet this is enough to trigger random softlockup warnings. Some of those warnings are below. Note that the traces are all pretty useless, as softlockup warnings so often seem to be. Of course, it could be that whatever is causing these pauses really _is_ stalling for a whole second occasionally, dunno. But I didn't notice any long stalls in the console output when a particular storm of softlockup warnings came out. But I'll sit on this patch for a while until this gets sorted out. Meanwhile, please double-check the elapsed-time arithmetic in there, maybe do a bit of runtime testing? [ 78.820961] BUG: soft lockup detected on CPU#0! [ 78.821083] [<c0122475>] update_process_times+0x32/0x54 [ 78.821216] [<c012fe7a>] tick_sched_timer+0x61/0x9c [ 78.821340] [<c012c2e7>] hrtimer_interrupt+0x142/0x1d4 [ 78.821463] [<c012fe19>] tick_sched_timer+0x0/0x9c [ 78.821587] [<c012f74a>] tick_do_broadcast+0x1f/0x3f [ 78.821707] [<c012fa01>] tick_handle_oneshot_broadcast+0x47/0x72 [ 78.821852] [<c01067ca>] timer_interrupt+0x1a/0x20 [ 78.821968] [<c014291e>] handle_IRQ_event+0x1a/0x3f [ 78.822089] [<c0143521>] handle_edge_irq+0x9d/0xcc [ 78.822206] [<c0105d7b>] do_IRQ+0x53/0x6c [ 78.822307] [<c012f4f0>] tick_notify+0x15c/0x208 [ 78.822422] [<c01044cf>] common_interrupt+0x23/0x28 [ 78.822539] [<c012f1d4>] clockevents_notify+0x8/0x36 [ 78.822663] [<c020d199>] acpi_processor_idle+0x1d2/0x36d [ 78.822798] [<c0102345>] cpu_idle+0x44/0x5e [ 78.822900] [<c03baa8d>] start_kernel+0x26d/0x275 [ 78.823017] [<c03ba3fe>] unknown_bootoption+0x0/0x202 [ 78.823142] ======================= [ 106.282830] BUG: soft lockup detected on CPU#0! [ 106.282967] [<c0122475>] update_process_times+0x32/0x54 [ 106.283116] [<c012fe7a>] tick_sched_timer+0x61/0x9c [ 106.283255] [<c012c2e7>] hrtimer_interrupt+0x142/0x1d4 [ 106.283391] [<c012fe19>] tick_sched_timer+0x0/0x9c [ 106.283530] [<c012f74a>] tick_do_broadcast+0x1f/0x3f [ 106.283663] [<c012fa01>] tick_handle_oneshot_broadcast+0x47/0x72 [ 106.283821] [<c01067ca>] timer_interrupt+0x1a/0x20 [ 106.283949] [<c014291e>] handle_IRQ_event+0x1a/0x3f [ 106.284084] [<c0143521>] handle_edge_irq+0x9d/0xcc [ 106.284215] [<c0105d7b>] do_IRQ+0x53/0x6c [ 106.284326] [<c012f4f0>] tick_notify+0x15c/0x208 [ 106.284455] [<c01044cf>] common_interrupt+0x23/0x28 [ 106.284587] [<c012f1d4>] clockevents_notify+0x8/0x36 [ 106.284725] [<c020d199>] acpi_processor_idle+0x1d2/0x36d [ 106.284875] [<c0102345>] cpu_idle+0x44/0x5e [ 106.284988] [<c03baa8d>] start_kernel+0x26d/0x275 [ 106.285117] [<c03ba3fe>] unknown_bootoption+0x0/0x202 [ 106.285257] ======================= [ 109.266423] BUG: soft lockup detected on CPU#0! [ 109.266558] [<c0122475>] update_process_times+0x32/0x54 [ 109.266703] [<c012fe7a>] tick_sched_timer+0x61/0x9c [ 109.270745] [<c012c2e7>] hrtimer_interrupt+0x142/0x1d4 [ 109.274790] [<c012fe19>] tick_sched_timer+0x0/0x9c [ 109.278865] [<c012f74a>] tick_do_broadcast+0x1f/0x3f [ 109.282950] [<c012fa01>] tick_handle_oneshot_broadcast+0x47/0x72 [ 109.287026] [<c01067ca>] timer_interrupt+0x1a/0x20 [ 109.291012] [<c014291e>] handle_IRQ_event+0x1a/0x3f [ 109.294950] [<c0143521>] handle_edge_irq+0x9d/0xcc [ 109.298864] [<c0105d7b>] do_IRQ+0x53/0x6c [ 109.302818] [<c012f4f0>] tick_notify+0x15c/0x208 [ 109.306740] [<c01044cf>] common_interrupt+0x23/0x28 [ 109.310641] [<c012f1d4>] clockevents_notify+0x8/0x36 [ 109.314543] [<c020d199>] acpi_processor_idle+0x1d2/0x36d [ 109.318461] [<c0102345>] cpu_idle+0x44/0x5e [ 109.322348] [<c03baa8d>] start_kernel+0x26d/0x275 [ 109.326267] [<c03ba3fe>] unknown_bootoption+0x0/0x202 [ 109.330188] ======================= (ah, the Vaio breakage seems to be -mm-only, whew) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/