two small corrections. On (01/28/16 19:41), Sergey Senozhatsky wrote: [..] > > Unfortunately, it's not reproduced anymore. > > > > If it's clearly a spinlock caller's bug as you said, modifying the > > spinlock debug code does not help it at all. But I found there's a > > possiblity in the debug code *itself* to cause a lockup. So I tried > > to fix it. What do you think about it? > > ah... silly me... you mean the first CPU that triggers the spin_dump() will ^^^ this, of course, is true for console_sem->lock and logbuf_lock only.
> deadlock itself, so the rest of CPUs will see endless recursive > spin_lock()->spin_dump()->spin_lock()->spin_dump() calls? > > like the one below? > > > CPUZ is doing vprintk_emit()->spin_lock(), CPUA is the spin_lock's owner > > CPUZ -> vprintk_emit() > __spin_lock_debug() > for (i = 0; i < `loops_per_jiffy * HZ'; i++) { << wait for > the lock > if (arch_spin_trylock()) > return; > __delay(1); > } > spin_dump() << lock is still owned by CPUA > { -> vprintk_emit() > __spin_lock_debug() > for (...) { > if (arch_spin_trylock()) > return; > __delay(1); > } - << CPUA unlocked the lock > spin_dump() > { -> vprintk_emit() > __spin_lock_debug() the "<< CPUA unlocked the lock" line better be here. to make it correct. + << CPUA unlocked the lock > for (...) { > if > (arch_spin_trylock()) << success!! > /* CPUZ now owns the > lock */ > return; > } > } > > << we return here with the spin_lock being owned by > this CPUZ > > trigger_all_cpu_backtrace() > > << and... now it does the arch_spin_lock() > /* > * The trylock above was causing a livelock. > Give the lower level arch > * specific lock code a chance to acquire the > lock. We have already > * printed a warning/backtrace at this point. > The non-debug arch > * specific code might actually succeed in > acquiring the lock. If it is > * not successful, the end-result is the same - > there is no forward > * progress. > */ > arch_spin_lock(&lock->raw_lock); > > << which obviously dealocks this CPU... > } > > trigger_all_cpu_backtrace() > > arch_spin_lock() > > > > > so > "the CPUZ is now keeping the lock forever, and not going to release it" > and > "CPUA-CPUX will do > vprintk_emit()->spin_lock()->spin_dump()->vprintk_emit()->..." > > > > My apologies for not getting it right the first time. Sorry! > > Can you please update your bug description in the commit message? > It's the deadlock that is causing the recursion on other CPUs in the > first place. -ss