On Tue, Dec 26, 2017 at 5:23 PM, Ryota Ozaki <ozak...@netbsd.org> wrote: > On Tue, Dec 26, 2017 at 4:50 PM, Tom Ivar Helbekkmo > <t...@hamartun.priv.no> wrote: >> Ryota Ozaki <ozak...@netbsd.org> writes: >> >>> One possible fix has been committed. >>> >>> Can you update the source code and try a new kernel? >> >> Will do. > > Thanks. > > >> >> Meanwhile, before I got around to building a kernel with debug options >> enabled, I had another hang. Got it into DDB successfully, but then I >> managed, while looking for a way to switch between CPUs (the man page is >> wrong in this respect), to get DDB to look at something it shouldn't >> have, so it just said "fatal pag", and that was that. >> >> I did get a backtrace of CPU 0, though, and it looked interesting: >> >> pool_catchup() >> pool_get() >> pool_cache_get_slow() >> pool_cache_get_paddr() >> m_get() >> m_gethdr() >> wm_add_rxbuf() >> wm_rxeof() >> wm_intr_legacy() >> intr_biglock_wrapper() >> Xintr_ioapic_level2() >> --- interrupt --- >> Xspllower() >> uvm_km_kmem_alloc() >> pool_page_alloc() >> pool_grow() >> pool_catchup() >> pool_get() >> pool_cache_get_slow() >> pool_cache_get_paddr() >> m_get() >> m_gethdr() >> tcp_output() >> tcp_send_wrapper() >> sosend() >> soo_write() >> dofilewrite() >> sys_write() >> syscall() >> --- syscall (number 4) --- > > Looks the below infinite loop is happening? > > I think we need to summon a pool expert. > > ozaki-r > > > (Copied and modified the diagram from PR 52858) > > [lwp #1] > | > [pool_grow with PR_NOWAIT > [set PR_GROWING and PR_GROWINGNOWAIT > [mutex_exit(&pp->pr_lock) > | > (interrupted) > > [intr #1] > | > [pool_catchup > [pool_grow with PR_NOWAIT > [see PR_GROWING and PR_GROWINGNOWAIT are set > [return ERESTART > [repeat pool_grow in pool_catchup...
I think the below patch fixes the above issue, but probably there is a better solution. ozaki-r diff --git a/sys/kern/subr_pool.c b/sys/kern/subr_pool.c index 3e6c9225482..e4be8bf0682 100644 --- a/sys/kern/subr_pool.c +++ b/sys/kern/subr_pool.c @@ -1247,6 +1247,7 @@ static int pool_catchup(struct pool *pp) { int error = 0; + int s = splvm(); while (POOL_NEEDS_CATCHUP(pp)) { error = pool_grow(pp, PR_NOWAIT); @@ -1256,6 +1257,7 @@ pool_catchup(struct pool *pp) break; } } + splx(s); return error; }