On Thu, Dec 28, 2017 at 5:05 PM, Tom Ivar Helbekkmo <[email protected]> wrote: > Ryota Ozaki <[email protected]> writes: > >> I think the below patch fixes the above issue, but probably >> there is a better solution. > > Looks like didn't -- it just changed it a little bit. Just like the > last time, the hang happened while reading email over IMAP, which > exercises disk and network at the same time, while the machine was busy > doing a parallellized system build in the background. This time, > though, I got a core dump. Here's the hang (the active process on this > CPU is the IMAP server):
Oh, my patch failed to keep SPL at IPL_VM because mutex_exit tries to restore an SPL where mutex_enter is called. So I had to put splvm before mutex_enter. Could you try the 2nd patch: http://www.netbsd.org/~ozaki-r/fix-pool_catchup.diff > > __cpu_simple_lock_try() at __cpu_simple_lock_try+0x9 > pool_grow() at pool_grow+0x55d > pool_catchup() at pool_catchup+0x32 > pool_get() at pool_get+0x492 > pool_cache_get_slow() at pool_cache_get_slow+0x1b4 > pool_cache_get_paddr() at pool_cache_get_paddr+0x275 > m_get() at m_get+0x2a > m_gethdr() at m_gethdr+0x9 > wm_add_rxbuf() at wm_add_rxbuf+0x3a > wm_rxeof() at wm_rxeof+0x146 > wm_intr_legacy() at wm_intr_legacy+0xa1 > intr_biglock_wrapper() at intr_biglock_wrapper+0x1d > Xintr_ioapic_level2() at Xintr_ioapic_level2+0xf7 > --- interrupt --- > Xspllower() at Xspllower+0xe > uvm_km_kmem_alloc() at uvm_km_kmem_alloc+0x139 > pool_page_alloc() at pool_page_alloc+0x2c > pool_grow() at pool_grow+0x24f > pool_catchup() at pool_catchup+0x32 > pool_get() at pool_get+0x492 > pool_cache_get_slow() at pool_cache_get_slow+0x1b4 > pool_cache_get_paddr() at pool_cache_get_paddr+0x275 > m_get() at m_get+0x2a > m_gethdr() at m_gethdr+0x9 > sosend() at sosend+0x35a > soo_write() at soo_write+0x2c > dofilewrite() at dofilewrite+0x97 > sys_write() at sys_write+0x5f > syscall() at syscall+0x1d8 > --- syscall (number 4) --- > > The only other CPU that looks interesting has this (copied from a > photograph of the console, as crash(8) doesn't know about CPUs): > > _kernel_lock() > ip_slowtimo() > pfslowtimo() > callout_softclock() > softint_dispatch() This is correct. intr_biglock_wrapper in the first backtrace holds KERNEL_LOCK and this _kernel_lock() waits for it to be released. Thanks, ozaki-r
