> From: Sergey Senozhatsky [mailto:sergey.senozhatsky.w...@gmail.com] > Sent: Thursday, January 28, 2016 11:38 AM > To: Byungchul Park > Cc: a...@linux-foundation.org; mi...@kernel.org; linux- > ker...@vger.kernel.org; akinobu.m...@gmail.com; j...@suse.cz; > torva...@linux-foundation.org; pe...@hurleysoftware.com; > sergey.senozhatsky.w...@gmail.com; sergey.senozhat...@gmail.com > Subject: Re: [PATCH v4] lib/spinlock_debug.c: prevent a recursive cycle in > the debug code > > ok, I'll repeat the questions. > > under what circumstances you hit this problem? ...memory corruption, cpu > core has been powered off, while it owned the spin_lock... your irqsave > didn't work?
I think these are not the my case. > > the thing is, it's really-really hard to lockup in console_trylock()... > > int down_trylock(struct semaphore *sem) > { > unsigned long flags; > int count; > > raw_spin_lock_irqsave(&sem->lock, flags); <<<<<< um... I also think it's hard, but a backtrace said the lockup happened here. > count = sem->count - 1; > if (likely(count >= 0)) > sem->count = count; > raw_spin_unlock_irqrestore(&sem->lock, flags); > > return (count < 0); > } > > was not able to dereference sem->count? `*sem' was corrupted? or because > sem->lock was corrupted? in any of those cases you solve the problem from > the wrong side. if you have a memory corruption then it can corrupt What I solved here is to make it possible to print what the problem is, by the spinlock debug code instead of system lockup while printing a debug message. I think it's not wrong. > anything, > including, for example, console driver spin_lock. > > > suppose, that you hit do_raw_spin_lock()->spin_dump(), which means that > the > spin_lock was not corrupted, it passed debug_spin_lock_before() after all. > and that spin_lock was taken for longer than `loops_per_jiffy * HZ', while > other CPU was doing > > count = sem->count - 1; > if (likely(count >= 0)) > sem->count = count; > > ??? > > was the CPU that owned the lock alive? (h/w fault, perhaps?). I am just curious.. Is it impossible but by h/w fault? e.g. timing to allocate virtual cpus to a guest machine when using virtual machine and so on. > > > dunno... yes, this > printk()->console_trylock()->do_raw_spin_lock()->spin_dump()- > >printk() > is possible, but it's possible only when your system is screwed up badly, > so > badly that this spin_dump() loop is not really a problem, IMHO. IMHO, even though a system is screwed up badly, the spinlock debug code have to print the information about the problem without lockup. > > if I'm missing something terribly obvious here, then please give more > details. There are already codes to prevent the recursive cycle in the bug case, but not for the just suspected case. I just made it possible for the latter case. That's all this patch is doing, now. :) thanks, byungchul > > -ss > > > > Signed-off-by: Byungchul Park <byungchul.p...@lge.com> > > > --- > > > include/linux/debug_locks.h | 4 ++++ > > > kernel/locking/spinlock_debug.c | 14 +++++++++++--- > > > 2 files changed, 15 insertions(+), 3 deletions(-) > > > > > > diff --git a/include/linux/debug_locks.h b/include/linux/debug_locks.h > > > index 822c135..b0f977e 100644 > > > --- a/include/linux/debug_locks.h > > > +++ b/include/linux/debug_locks.h > > > @@ -10,6 +10,10 @@ struct task_struct; > > > extern int debug_locks; > > > extern int debug_locks_silent; > > > > > > +static inline void __debug_locks_on(void) > > > +{ > > > + debug_locks = 1; > > > +} > > > > > > static inline int __debug_locks_off(void) > > > { > > > diff --git a/kernel/locking/spinlock_debug.c > b/kernel/locking/spinlock_debug.c > > > index 0374a59..65177ba 100644 > > > --- a/kernel/locking/spinlock_debug.c > > > +++ b/kernel/locking/spinlock_debug.c > > > @@ -113,11 +113,19 @@ static void __spin_lock_debug(raw_spinlock_t > *lock) > > > return; > > > __delay(1); > > > } > > > - /* lockup suspected: */ > > > - spin_dump(lock, "lockup suspected"); > > > + > > > + /* > > > + * We should prevent calling printk() further, since it would cause > > > + * an infinite recursive cycle if it's called from printk()! > > > + */ > > > + if (__debug_locks_off()) { > > > + /* lockup suspected: */ > > > + spin_dump(lock, "lockup suspected"); > > > #ifdef CONFIG_SMP > > > - trigger_all_cpu_backtrace(); > > > + trigger_all_cpu_backtrace(); > > > #endif > > > + __debug_locks_on(); > > > + } > > > > > > /* > > > * The trylock above was causing a livelock. Give the lower level > arch > > > -- > > > 1.9.1 > >