On Sun, Nov 22, 2020 at 2:56 AM Tetsuo Handa <penguin-ker...@i-love.sakura.ne.jp> wrote: > > On 2020/11/20 18:27, Dmitry Vyukov wrote: > > Peter, so far it looks like just a very large, but normal graph to me. > > The cheapest from an engineering point of view solution would be just > > to increase the constants. I assume a 2x increase should buy us lots > > of time to overflow. > > I can think of more elaborate solutions, e.g. using bitmasks to > > represent hot leaf and top-level locks. But it will both increase the > > resulting code complexity (no uniform representation anymore, all code > > will need to deal with different representations) and require some > > time investments (that I can't justify for me at least as compared to > > just throwing a bit more machine memory at it). And in the end it > > won't really reduce the size of the graph. > > What do you think? > > > > Yes, I think it is a normal graph; simply syzkaller kernels tend to record > a few times more dependencies than my idle kernel (shown bottom). > > Peter, you guessed that the culprit is sysfs at > https://lkml.kernel.org/r/20200916115057.go2...@hirez.programming.kicks-ass.net > , but > syzbot reported at > https://syzkaller.appspot.com/text?tag=MachineInfo&x=99b8f2b092d9714f > that "BUG: MAX_LOCKDEP_ENTRIES too low!" can occur on a VM with only 2 CPUs. > Is your guess catching the culprit? > > We could improve a few locks, but as a whole we won't be able to afford > keeping > sum of individual dependencies under current threshold. Therefore, allow > lockdep to > tune the capacity and allow syzkaller to dump when reaching the capacity will > be > the way to go. > > > > # cat /proc/lockdep_stats > lock-classes: 1236 [max: 8192] > direct dependencies: 9610 [max: 32768] > indirect dependencies: 40401 > all direct dependencies: 174635 > dependency chains: 11398 [max: 65536] > dependency chain hlocks used: 42830 [max: 327680] > dependency chain hlocks lost: 0 > in-hardirq chains: 61 > in-softirq chains: 414 > in-process chains: 10923 > stack-trace entries: 93041 [max: 524288] > number of stack traces: 4997 > number of stack hash chains: 4292 > combined max dependencies: 281074520 > hardirq-safe locks: 50 > hardirq-unsafe locks: 805 > softirq-safe locks: 146 > softirq-unsafe locks: 722 > irq-safe locks: 155 > irq-unsafe locks: 805 > hardirq-read-safe locks: 2 > hardirq-read-unsafe locks: 129 > softirq-read-safe locks: 11 > softirq-read-unsafe locks: 123 > irq-read-safe locks: 11 > irq-read-unsafe locks: 129 > uncategorized locks: 224 > unused locks: 0 > max locking depth: 15 > max bfs queue depth: 215 > chain lookup misses: 11664 > chain lookup hits: 37393935 > cyclic checks: 11053 > redundant checks: 0 > redundant links: 0 > find-mask forwards checks: 1588 > find-mask backwards checks: 1779 > hardirq on events: 17502380 > hardirq off events: 17502376 > redundant hardirq ons: 0 > redundant hardirq offs: 0 > softirq on events: 90845 > softirq off events: 90845 > redundant softirq ons: 0 > redundant softirq offs: 0 > debug_locks: 1 > > zapped classes: 0 > zapped lock chains: 0 > large chain blocks: 1 > # awk ' { if ($2 == "OPS:") print $5" "$9 } ' /proc/lockdep | sort -rV | head > -n 30 > 423 (wq_completion)events > 405 (wq_completion)events_unbound > 393 &f->f_pos_lock > 355 &p->lock > 349 sb_writers#3 > 342 sb_writers#6 > 338 &of->mutex > 330 (work_completion)(&entry->work) > 330 pernet_ops_rwsem > 289 epmutex > 288 &ep->mtx > 281 tty_mutex > 280 &tty->legacy_mutex > 273 &tty->legacy_mutex/1 > 269 &tty->ldisc_sem > 268 (wq_completion)ipv6_addrconf > 266 (work_completion)(&(&ifa->dad_work)->work) > 266 (linkwatch_work).work > 266 (addr_chk_work).work > 266 &ldata->atomic_read_lock > 265 (work_completion)(&buf->work) > 265 rtnl_mutex > 263 &tty->atomic_write_lock > 262 &buf->lock > 261 &tty->termios_rwsem > 242 &port->buf.lock/1 > 242 kn->active#40 > 241 &o_tty->termios_rwsem/1 > 240 registration_lock > 239 &ldata->output_lock > # awk ' { if ($2 == "OPS:") print $7" "$9 } ' /proc/lockdep | sort -rV | head > -n 30 > 642 pool_lock > 641 &obj_hash[i].lock > 582 tk_core.seq.seqcount > 561 hrtimer_bases.lock > 560 &rt_rq->rt_runtime_lock > 559 &rt_b->rt_runtime_lock > 559 &cp->lock > 559 &cfs_rq->removed.lock > 559 &cfs_b->lock > 558 &rq->lock > 550 &tsk->delays->lock > 549 &p->pi_lock > 506 &base->lock > 504 &n->list_lock > 432 &____s->seqcount > 404 &x->wait#10 > 401 &pool->lock > 369 &zone->lock > 330 rcu_node_0 > 326 &(kretprobe_table_locks[i].lock) > 326 pgd_lock > 324 &pgdat->lru_lock > 323 lock#5 > 321 &page_wait_table[i] > 319 ptlock_ptr(page)#2/1 > 318 ptlock_ptr(page)#2 > 316 &sem->wait_lock > 316 &mm->page_table_lock > 316 ptlock_ptr(page) > 315 &anon_vma->rwsem
The latest syzbot logs now contain these dumps as well, if anybody is interested in more: https://syzkaller.appspot.com/bug?id=63fc8d0501c39609dd2f268e4190ec9a72619563 https://syzkaller.appspot.com/bug?id=3d97ba93fb3566000c1c59691ea427370d33ea1b