> Date: Mon, 5 May 2025 18:08:19 +0200 > From: Manuel Bouyer <bou...@antioche.eu.org> > > still trying to debug panics/hangs on a heavily loaded web server
What kernel version? > I got a hard hang; What does `hard hang' mean? Is there there a heartbeat panic? Can you share the full output of ps, ps/w, and show all tstiles? And can you show the stack traces for all CPUs with `mach cpu N'? > db{0}> mach cpu 2 > using CPU 2 > db{0}> tr > _kernel_lock() at netbsd:_kernel_lock+0xd5 > mb_drain() at netbsd:mb_drain+0x17 > pool_grow() at netbsd:pool_grow+0x3b9 > pool_get() at netbsd:pool_get+0x3c7 > [...] > > I wonder if we can have a deadlock here: CPU 2 holds mbuf pool's lock and > tries to get _kernel_lock(). It looks like the softint thread on CPU 0 > holds the kernel_lock (as it's not running with NET_MPSAFE) and tries > to get the mbuf pool's lock. This deadlock doesn't make sense because we drop the pool lock around the drain hook (mb_drain): 1129 /* 1130 * Since the drain hook is going to free things 1131 * back to the pool, unlock, call the hook, re-lock, 1132 * and check the hardlimit condition again. 1133 */ 1134 mutex_exit(&pp->pr_lock); 1135 (*pp->pr_drain_hook)(pp->pr_drain_hook_arg, flags); 1136 mutex_enter(&pp->pr_lock); 1137 if (pp->pr_nout < pp->pr_hardlimit) 1138 goto startover; https://nxr.netbsd.org/xref/src/sys/kern/subr_pool.c?r=1.293#1129 > Other CPUs are also trying to get the kernel_lock or the mbuf's pool lock. > Several are in: > mutex_vector_enter() at netbsd:mutex_vector_enter+0x209 > tcp_timer_rexmt() at netbsd:tcp_timer_rexmt+0x28 > callout_softclock() at netbsd:callout_softclock+0xd2 > softint_dispatch() at netbsd:softint_dispatch+0x11c At tcp_timer_rexmt+0x28 (which is likely the first call after the function prologue), I suspect this is waiting for softnet_lock, not the mbuf pool lock: 300 void 301 tcp_timer_rexmt(void *arg) 302 { 303 struct tcpcb *tp = arg; 304 uint32_t rto; 305 #ifdef TCP_DEBUG 306 struct socket *so = NULL; 307 short ostate; 308 #endif 309 310 mutex_enter(softnet_lock); https://nxr.netbsd.org/xref/src/sys/netinet/tcp_timer.c?r=1.99#310