On 2021/01/23 15:35, Alexey Kardashevskiy wrote: > this behaves quite different but still produces the message (i have > show_workqueue_state() right after the bug message): > > > [ 85.803991] BUG: MAX_LOCKDEP_KEYS too low! > [ 85.804338] turning off the locking correctness validator. > [ 85.804474] Showing busy workqueues and worker pools: > [ 85.804620] workqueue events_unbound: flags=0x2 > [ 85.804764] pwq 16: cpus=0-7 flags=0x4 nice=0 active=1/512 refcnt=3 > [ 85.804965] in-flight: 81:bpf_map_free_deferred > [ 85.805229] workqueue events_power_efficient: flags=0x80 > [ 85.805357] pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256 refcnt=2 > [ 85.805558] in-flight: 57:gc_worker > [ 85.805877] pool 4: cpus=2 node=0 flags=0x0 nice=0 hung=0s workers=3 idle: > 82 24 > [ 85.806147] pool 16: cpus=0-7 flags=0x4 nice=0 hung=69s workers=3 idle: 7 > 251 > ^C[ 100.129747] maxlockdep (5104) used greatest stack depth: 8032 bytes left > > root@le-dbg:~# grep "lock-classes" /proc/lockdep_stats > lock-classes: 8192 [max: 8192] >
Right. Hillf's patch can reduce number of active workqueue's worker threads, for only one worker thread can call bpf_map_free_deferred() (which is nice because it avoids bloat of active= and refcnt= fields). But Hillf's patch is not for fixing the cause of "BUG: MAX_LOCKDEP_KEYS too low!" message. Like Dmitry mentioned, bpf syscall allows producing work items faster than bpf_map_free_deferred() can consume. (And a similar problem is observed for network namespaces.) Unless there is a bug that prevents bpf_map_free_deferred() from completing, the classical solution is to put pressure on producers (i.e. slow down bpf syscall side) in a way that consumers (i.e. __bpf_map_put()) will not schedule thousands of backlog "struct bpf_map" works.