changes from v4 to v5 - found out a clear scenario which make a system crazy. at least it should not be caused by the debug code.
changes from v3 to v4 - reuse a existing code as much as possible for preventing an infinite recursive cycle. changes from v2 to v3 - avoid printk() only in case of lockup suspected, not real lockup in which case it does not help at all. - consider not only console_sem.lock but also logbuf_lock which is used by printk(). changes from v1 to v2 - only change comment and commit message esp. replacing "deadlock" with "infinite recursive cycle", since it is not an actual deadlock. thanks, byungchul -----8<----- >From eed077240e0b0d9f14d91037ef1915feab85aa4d Mon Sep 17 00:00:00 2001 From: Byungchul Park <byungchul.p...@lge.com> Date: Fri, 29 Jan 2016 21:23:24 +0900 Subject: [PATCH v5] lib/spinlock_debug.c: prevent a recursive cycle in the debug code It causes an infinite recursive cycle when using CONFIG_DEBUG_SPINLOCK, in the spin_dump(). Backtrace prints printk() -> console_trylock() -> do_raw_spin_lock() -> spin_dump() -> printk()... infinitely. When the spin_dump() is called from printk(), we should prevent the debug spinlock code from calling printk() again in that context. It's reasonable to avoid printing "lockup suspected" which is just a warning message but it would cause a real lockup definitely. The scenario is, cpu0 ==== printk console_trylock console_unlock up_console_sem up raw_spin_lock_irqsave(&sem->lock, flags) __up wake_up_process try_to_wake_up raw_spin_lock_irqsave(&p->pi_lock) __spin_lock_debug spin_dump <=== the problem point! printk console_trylock raw_spin_lock_irqsave(&sem->lock, flags) <=== DEADLOCK cpu1 ==== printk console_trylock raw_spin_lock_irqsave(&sem->lock, flags) __spin_lock_debug spin_dump printk ... <=== repeat the recursive cycle infinitely Signed-off-by: Byungchul Park <byungchul.p...@lge.com> --- kernel/locking/spinlock_debug.c | 16 +++++++++++++--- kernel/printk/printk.c | 5 +++++ 2 files changed, 18 insertions(+), 3 deletions(-) diff --git a/kernel/locking/spinlock_debug.c b/kernel/locking/spinlock_debug.c index 0374a59..cf7bc96 100644 --- a/kernel/locking/spinlock_debug.c +++ b/kernel/locking/spinlock_debug.c @@ -103,6 +103,8 @@ static inline void debug_spin_unlock(raw_spinlock_t *lock) lock->owner_cpu = -1; } +extern int is_console_lock(raw_spinlock_t *lock); + static void __spin_lock_debug(raw_spinlock_t *lock) { u64 i; @@ -113,11 +115,19 @@ static void __spin_lock_debug(raw_spinlock_t *lock) return; __delay(1); } - /* lockup suspected: */ - spin_dump(lock, "lockup suspected"); + + /* + * If this function is called from printk(), then we should + * not call printk() more. Or it will cause an infinite + * recursive cycle! + */ + if (likely(!is_console_lock(lock))) { + /* lockup suspected: */ + spin_dump(lock, "lockup suspected"); #ifdef CONFIG_SMP - trigger_all_cpu_backtrace(); + trigger_all_cpu_backtrace(); #endif + } /* * The trylock above was causing a livelock. Give the lower level arch diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 2ce8826..568ab11 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -1981,6 +1981,11 @@ asmlinkage __visible void early_printk(const char *fmt, ...) } #endif +int is_console_lock(raw_spinlock_t *lock) +{ + return lock == &console_sem.lock; +} + static int __add_preferred_console(char *name, int idx, char *options, char *brl_options) { -- 1.9.1