RE: [PATCH v4] lib/spinlock_debug.c: prevent a recursive cycle in the debug code

byungchul.park Wed, 27 Jan 2016 20:52:59 -0800

> From: Sergey Senozhatsky [mailto:sergey.senozhatsky.w...@gmail.com]
> Sent: Thursday, January 28, 2016 11:38 AM
> To: Byungchul Park
> Cc: a...@linux-foundation.org; mi...@kernel.org; linux-
> ker...@vger.kernel.org; akinobu.m...@gmail.com; j...@suse.cz;
> torva...@linux-foundation.org; pe...@hurleysoftware.com;
> sergey.senozhatsky.w...@gmail.com; sergey.senozhat...@gmail.com
> Subject: Re: [PATCH v4] lib/spinlock_debug.c: prevent a recursive cycle in
> the debug code
> 
> ok, I'll repeat the questions.
> 
> under what circumstances you hit this problem? ...memory corruption, cpu
> core has been powered off, while it owned the spin_lock... your irqsave
> didn't work?


I think these are not the my case.

> 
> the thing is, it's really-really hard to lockup in console_trylock()...
> 
> int down_trylock(struct semaphore *sem)
> {
>         unsigned long flags;
>         int count;
> 
>         raw_spin_lock_irqsave(&sem->lock, flags);   <<<<<<  um...

I also think it's hard, but a backtrace said the lockup happened here.

>         count = sem->count - 1;
>         if (likely(count >= 0))
>                 sem->count = count;
>         raw_spin_unlock_irqrestore(&sem->lock, flags);
> 
>         return (count < 0);
> }
> 
> was not able to dereference sem->count? `*sem' was corrupted? or because
> sem->lock was corrupted? in any of those cases you solve the problem from
> the wrong side. if you have a memory corruption then it can corrupt

What I solved here is to make it possible to print what the problem is, by
the spinlock debug code instead of system lockup while printing a debug
message. I think it's not wrong.

> anything,
> including, for example, console driver spin_lock.
> 
> 
> suppose, that you hit do_raw_spin_lock()->spin_dump(), which means that
> the
> spin_lock was not corrupted, it passed debug_spin_lock_before() after all.
> and that spin_lock was taken for longer than `loops_per_jiffy * HZ', while
> other CPU was doing
> 
>         count = sem->count - 1;
>         if (likely(count >= 0))
>                 sem->count = count;
> 
> ???
> 
> was the CPU that owned the lock alive? (h/w fault, perhaps?).

I am just curious.. Is it impossible but by h/w fault? e.g. timing to
allocate
virtual cpus to a guest machine when using virtual machine and so on.

> 
> 
> dunno... yes, this
>       printk()->console_trylock()->do_raw_spin_lock()->spin_dump()-
> >printk()
> is possible, but it's possible only when your system is screwed up badly,
> so
> badly that this spin_dump() loop is not really a problem, IMHO.

IMHO, even though a system is screwed up badly, the spinlock debug code have
to print the information about the problem without lockup.

> 
> if I'm missing something terribly obvious here, then please give more
> details.

There are already codes to prevent the recursive cycle in the bug case,
but not for the just suspected case. I just made it possible for the
latter case. That's all this patch is doing, now. :)

thanks,
byungchul

> 
>       -ss
> 
> > > Signed-off-by: Byungchul Park <byungchul.p...@lge.com>
> > > ---
> > >  include/linux/debug_locks.h     |  4 ++++
> > >  kernel/locking/spinlock_debug.c | 14 +++++++++++---
> > >  2 files changed, 15 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/include/linux/debug_locks.h b/include/linux/debug_locks.h
> > > index 822c135..b0f977e 100644
> > > --- a/include/linux/debug_locks.h
> > > +++ b/include/linux/debug_locks.h
> > > @@ -10,6 +10,10 @@ struct task_struct;
> > >  extern int debug_locks;
> > >  extern int debug_locks_silent;
> > >
> > > +static inline void __debug_locks_on(void)
> > > +{
> > > + debug_locks = 1;
> > > +}
> > >
> > >  static inline int __debug_locks_off(void)
> > >  {
> > > diff --git a/kernel/locking/spinlock_debug.c
> b/kernel/locking/spinlock_debug.c
> > > index 0374a59..65177ba 100644
> > > --- a/kernel/locking/spinlock_debug.c
> > > +++ b/kernel/locking/spinlock_debug.c
> > > @@ -113,11 +113,19 @@ static void __spin_lock_debug(raw_spinlock_t
> *lock)
> > >                   return;
> > >           __delay(1);
> > >   }
> > > - /* lockup suspected: */
> > > - spin_dump(lock, "lockup suspected");
> > > +
> > > + /*
> > > +  * We should prevent calling printk() further, since it would cause
> > > +  * an infinite recursive cycle if it's called from printk()!
> > > +  */
> > > + if (__debug_locks_off()) {
> > > +         /* lockup suspected: */
> > > +         spin_dump(lock, "lockup suspected");
> > >  #ifdef CONFIG_SMP
> > > - trigger_all_cpu_backtrace();
> > > +         trigger_all_cpu_backtrace();
> > >  #endif
> > > +         __debug_locks_on();
> > > + }
> > >
> > >   /*
> > >    * The trylock above was causing a livelock.  Give the lower level
> arch
> > > --
> > > 1.9.1
> >

RE: [PATCH v4] lib/spinlock_debug.c: prevent a recursive cycle in the debug code

Reply via email to