two small corrections.

On (01/28/16 19:41), Sergey Senozhatsky wrote:
[..]
> > Unfortunately, it's not reproduced anymore.
> > 
> > If it's clearly a spinlock caller's bug as you said, modifying the
> > spinlock debug code does not help it at all. But I found there's a
> > possiblity in the debug code *itself* to cause a lockup. So I tried
> > to fix it. What do you think about it?
> 
> ah... silly me... you mean the first CPU that triggers the spin_dump() will
                                        ^^^ this, of course, is true for
                                        console_sem->lock and logbuf_lock
                                        only.

> deadlock itself, so the rest of CPUs will see endless recursive
> spin_lock()->spin_dump()->spin_lock()->spin_dump() calls?
> 
> like the one below?
> 
> 
> CPUZ is doing vprintk_emit()->spin_lock(), CPUA is the spin_lock's owner
> 
> CPUZ -> vprintk_emit()
>       __spin_lock_debug()
>               for (i = 0; i < `loops_per_jiffy * HZ'; i++) {   << wait for 
> the lock
>                       if (arch_spin_trylock())
>                               return;
>                       __delay(1);
>                       }
>               spin_dump()       << lock is still owned by CPUA
>               {   -> vprintk_emit()
>                       __spin_lock_debug()
>                               for (...) {
>                                       if (arch_spin_trylock())
>                                               return;
>                                       __delay(1);
>                               }
-               << CPUA unlocked the lock
>                               spin_dump()
>                               {   -> vprintk_emit()
>                                       __spin_lock_debug()
      the "<< CPUA unlocked the lock" line better be here. to make it correct.

+               << CPUA unlocked the lock
>                                               for (...) {
>                                                       if 
> (arch_spin_trylock())   << success!!
>                                                       /* CPUZ now owns the 
> lock */
>                                                               return;
>                                               }
>                               }
> 
>                       << we return here with the spin_lock being owned by 
> this CPUZ
> 
>                               trigger_all_cpu_backtrace()
> 
>                       << and... now it does the arch_spin_lock()
>                               /*
>                                * The trylock above was causing a livelock.  
> Give the lower level arch
>                                * specific lock code a chance to acquire the 
> lock. We have already
>                                * printed a warning/backtrace at this point. 
> The non-debug arch
>                                * specific code might actually succeed in 
> acquiring the lock.  If it is
>                                * not successful, the end-result is the same - 
> there is no forward
>                                * progress.
>                                */
>                               arch_spin_lock(&lock->raw_lock);
> 
>                       << which obviously dealocks this CPU...
>               }
> 
>               trigger_all_cpu_backtrace()
> 
>               arch_spin_lock()
> 
> 
> 
> 
> so
>       "the CPUZ is now keeping the lock forever, and not going to release it"
> and
>       "CPUA-CPUX will do 
> vprintk_emit()->spin_lock()->spin_dump()->vprintk_emit()->..."
> 
> 
> 
> My apologies for not getting it right the first time. Sorry!
> 
> Can you please update your bug description in the commit message?
> It's the deadlock that is causing the recursion on other CPUs in the
> first place.

        -ss

Reply via email to