Hi everyone, I ran into the deadlock situation reported at the bottom. Actually, on my latest 3.10 kernel for some reason I don't get the report (the kernel just hangs for some reason), so it took me quite some time to track it down.
Once I figured the trigger to the machine hanging was adjtimex(), I reverted everything (between 3.9 to 3.10) that was touching kernel/time/timekeeping/timekeeping.c and kernel/time/ntp.c, I double checked that indeed the problem was not happening anymore, and finally started bisecting, landing on the following offending commit. THEN, and ONLY THEN, did I get the &%""รง+"% deadlock report. Do you guys have any ideas what could be wrong and how to fix it? Thank you, Gerlando commit 06c017fdd4dc48451a29ac37fc1db4a3f86b7f40 Author: John Stultz <john.stu...@linaro.org> Date: Fri Mar 22 11:37:28 2013 -0700 timekeeping: Hold timekeepering locks in do_adjtimex and hardpps In moving the NTP state to be protected by the timekeeping locks, be sure to acquire the timekeeping locks prior to calling ntp functions. Cc: Thomas Gleixner <t...@linutronix.de> Cc: Richard Cochran <richardcoch...@gmail.com> Cc: Prarit Bhargava <pra...@redhat.com> Signed-off-by: John Stultz <john.stu...@linaro.org> ================================= [ INFO: inconsistent lock state ] 3.10.0-04864-g346ecc9-dirty #16 Not tainted --------------------------------- inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage. SAKEY/738 [HC0[0]:SC0[0]:HE1:SE1] takes: (timekeeper_lock){?.-...}, at: [<c004a3e4>] do_adjtimex+0x64/0xbc {IN-HARDIRQ-W} state was registered at: [<c0055138>] __lock_acquire+0xabc/0x1bb8 [<c0056838>] lock_acquire+0xa8/0x15c [<c04c14ec>] _raw_spin_lock_irqsave+0x50/0x64 [<c00497a4>] do_timer+0x2c/0xa54 [<c004e7f4>] tick_periodic+0x74/0x9c [<c004e834>] tick_handle_periodic+0x18/0x7c [<c001349c>] orion_timer_interrupt+0x24/0x34 [<c0069c2c>] handle_irq_event_percpu+0x5c/0x300 [<c0069f0c>] handle_irq_event+0x3c/0x5c [<c006c194>] handle_level_irq+0x8c/0xe8 [<c0069574>] generic_handle_irq+0x30/0x4c [<c000951c>] handle_IRQ+0x30/0x84 [<c04c2178>] __irq_svc+0x38/0xa0 [<c06cf15c>] calibrate_delay+0x350/0x4e4 [<c06986e0>] start_kernel+0x23c/0x2c4 [<0000803c>] 0x803c irq event stamp: 32358 hardirqs last enabled at (32357): [<c0008c64>] ret_fast_syscall+0x24/0x44 hardirqs last disabled at (32358): [<c04c14bc>] _raw_spin_lock_irqsave+0x20/0x64 softirqs last enabled at (32160): [<c001e234>] __do_softirq+0x1b8/0x308 softirqs last disabled at (32137): [<c001e77c>] irq_exit+0xa0/0xd8 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(timekeeper_lock); <Interrupt> lock(timekeeper_lock); *** DEADLOCK *** 1 lock held by SAKEY/738: #0: (timekeeper_lock){?.-...}, at: [<c004a3e4>] do_adjtimex+0x64/0xbc stack backtrace: CPU: 0 PID: 738 Comm: SAKEY Not tainted 3.10.0-04864-g346ecc9-dirty #16 [<c000d67c>] (unwind_backtrace+0x0/0xf0) from [<c000b530>] (show_stack+0x10/0x14) [<c000b530>] (show_stack+0x10/0x14) from [<c04ba07c>] (print_usage_bug.part.27+0x218/0x280) [<c04ba07c>] (print_usage_bug.part.27+0x218/0x280) from [<c0053058>] (mark_lock+0x538/0x6bc) [<c0053058>] (mark_lock+0x538/0x6bc) from [<c005326c>] (mark_held_locks+0x90/0x124) [<c005326c>] (mark_held_locks+0x90/0x124) from [<c00533a8>] (trace_hardirqs_on_caller+0xa8/0x23c) [<c00533a8>] (trace_hardirqs_on_caller+0xa8/0x23c) from [<c04c1c60>] (_raw_spin_unlock_irq+0x24/0x5c) [<c04c1c60>] (_raw_spin_unlock_irq+0x24/0x5c) from [<c004ac8c>] (__do_adjtimex+0x17c/0x65c) [<c004ac8c>] (__do_adjtimex+0x17c/0x65c) from [<c004a404>] (do_adjtimex+0x84/0xbc) [<c004a404>] (do_adjtimex+0x84/0xbc) from [<c001d62c>] (SyS_adjtimex+0x50/0xa8) [<c001d62c>] (SyS_adjtimex+0x50/0xa8) from [<c0008c40>] (ret_fast_syscall+0x0/0x44) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/