On Mon, May 01, 2017 at 10:18:58PM +0200, Mark Kettenis wrote: > > Date: Mon, 1 May 2017 20:58:29 +0100 > > From: Stuart Henderson <s...@spacehopper.org> > > > > Userland is non-responsive, machine is pingable, tcp connections open > > but no banner from ssh. No failed pool requests. This kernel is from > > today's snapshot but I saw the same with one from a couple of days > > ago. Is there anything else I can get that might be useful? > > > > ddb> ps > > PID TID PPID UID S FLAGS WAIT COMMAND > > 99554 86967 57409 55 3 0x2 vp cc > > 57409 23557 97377 55 3 0x82 wait cc > > 97377 51254 49407 55 3 0x10008a pause sh > > 71034 186155 65198 0 3 0x11 vp perl > > 49407 183801 58608 55 3 0x82 wait gmake > > 58608 251568 90720 55 3 0x10008a pause sh > > 90720 294385 26849 55 3 0x82 wait gmake > > 26849 434857 31480 55 3 0x100088 pause sh > > 31480 479316 1945 55 3 0x10008a pause sh > > 1945 53261 1392 55 3 0x82 wait gmake > > 1392 297593 28991 55 3 0x100088 pause sh > > 28991 101756 11650 55 3 0x10008a pause sh > > 11650 273060 70062 55 3 0x82 wait gmake > > 70062 380995 21324 55 3 0x82 wait gmake > > 21324 380494 20357 55 3 0x10008a pause make > > 20357 495141 79040 55 3 0x10008a pause sh > > 79040 411698 40069 55 3 0x10008a pause make > > 40069 407214 61289 55 3 0x10008a pause sh > > 61289 440156 65198 55 3 0x10008a pause make > > 16484 143829 63578 55 3 0x82 nanosleep perl > > 63578 247597 69857 55 3 0x10008a pause sh > > 69857 3708 28018 55 3 0x10008a pause make > > 28018 161747 1 55 3 0x10008a pause sh > > 78305 185109 40308 1000 3 0x100083 ttyin ksh > > 65198 454438 69872 0 3 0x93 wait perl > > 69872 91535 40308 1000 3 0x10008b pause ksh > > 40308 108204 1 1000 3 0x100080 kqread tmux > > 72632 510504 69073 1000 3 0x100083 kqread tmux > > 69073 166246 39096 1000 3 0x10008b pause ksh > > 39096 474432 39165 1000 3 0x10 vp sshd > > 39165 380864 95218 0 3 0x92 poll sshd > > 19837 75515 1 0 3 0x100003 vp getty > > 61 140725 1 0 3 0x100010 vp cron > > 33247 144573 1 110 3 0x100090 poll sndiod > > 85245 294054 1 99 3 0x100090 poll sndiod > > 20071 339430 77361 95 3 0x100092 kqread smtpd > > 31714 216717 77361 103 3 0x100092 kqread smtpd > > 38145 373966 77361 95 3 0x100092 kqread smtpd > > 73235 449750 77361 95 3 0x100092 kqread smtpd > > 52512 523411 77361 95 3 0x100092 kqread smtpd > > 25217 17706 77361 95 3 0x100092 kqread smtpd > > 77361 512649 1 0 3 0x100080 kqread smtpd > > 95218 352524 1 0 3 0x80 select sshd > > 28640 338771 0 0 3 0x14280 nfsidl nfsio > > 30707 131410 0 0 3 0x14280 nfsidl nfsio > > 26109 142203 0 0 3 0x14280 nfsidl nfsio > > 61054 453416 0 0 3 0x14280 nfsidl nfsio > > 20679 124381 1 0 3 0x80 poll rpc.statd > > 75142 494960 1 28 3 0x100090 poll portmap > > 13394 497677 1 0 3 0x100000 vp ntpd > > 56991 117256 27035 83 3 0x100092 poll ntpd > > 27035 498377 1 83 3 0x100092 poll ntpd > > 19071 360785 2016 74 3 0x100090 bpf pflogd > > 2016 326372 1 0 3 0x80 netio pflogd > > 75485 263260 29155 73 3 0x100090 kqread syslogd > > 29155 379800 1 0 3 0x100082 netio syslogd > > 9314 271265 1 77 3 0x100090 poll dhclient > > 77002 222287 1 0 3 0x80 poll dhclient > > 4332 479844 1 0 3 0x80 mfsidl mount_mfs > > 58334 330646 0 0 3 0x14200 pgzero zerothread > > 15557 331142 0 0 3 0x14200 aiodoned aiodoned > > 34557 432814 0 0 3 0x14200 syncer update > > 82663 208419 0 0 3 0x14200 cleaner cleaner > > 51853 347618 0 0 3 0x14200 reaper reaper > > 18753 499821 0 0 3 0x14200 pgdaemon pagedaemon > > 59831 415568 0 0 3 0x14200 bored crynlk > > 29354 478337 0 0 3 0x14200 bored crypto > > 33898 6377 0 0 3 0x14200 pftm pfpurge > > 33736 115197 0 0 3 0x14200 usbtsk usbtask > > 54044 43212 0 0 3 0x14200 usbatsk usbatsk > > 57344 273215 0 0 3 0x14200 bored softnet > > 88556 58351 0 0 3 0x14200 bored systqmp > > 69342 477649 0 0 3 0x14200 bored systq > > 25767 494373 0 0 3 0x40014200 bored softclock > > *43785 51419 0 0 7 0x40014200 idle0 > > 9892 420912 0 0 3 0x14200 kmalloc kmthread > > 1 54365 0 0 3 0x82 wait init > > 0 0 -1 0 3 0x10200 scheduler swapper > > The diff below might fix thise. Or it might actually turn this into a > hard hang... > > Nevertheless, could you try running with it?
I have effectively the identical change on my SMP branch, so if you want to commit this, ok drahn@ It just pushes the allocation failure into the CANFAIL case where it should be handled almost gracefully. On SMP it was necessary it appeared necessary because of nested mutex/locks. > > > Index: pmap.c > =================================================================== > RCS file: /cvs/src/sys/arch/arm64/arm64/pmap.c,v > retrieving revision 1.33 > diff -u -p -r1.33 pmap.c > --- pmap.c 15 Apr 2017 11:15:02 -0000 1.33 > +++ pmap.c 1 May 2017 20:16:44 -0000 > @@ -322,17 +322,10 @@ pmap_vp_enter(pmap_t pm, vaddr_t va, str > struct pmapvp2 *vp2; > struct pmapvp3 *vp3; > > - int vp_pool_flags; > - if (pm == pmap_kernel()) { > - vp_pool_flags = PR_NOWAIT; > - } else { > - vp_pool_flags = PR_WAITOK |PR_ZERO; > - } > - > if (pm->have_4_level_pt) { > vp1 = pm->pm_vp.l0->vp[VP_IDX0(va)]; > if (vp1 == NULL) { > - vp1 = pool_get(&pmap_vp_pool, vp_pool_flags); > + vp1 = pool_get(&pmap_vp_pool, PR_NOWAIT | PR_ZERO); > if (vp1 == NULL) { > if ((flags & PMAP_CANFAIL) == 0) > panic("%s: unable to allocate L1", > @@ -347,7 +340,7 @@ pmap_vp_enter(pmap_t pm, vaddr_t va, str > > vp2 = vp1->vp[VP_IDX1(va)]; > if (vp2 == NULL) { > - vp2 = pool_get(&pmap_vp_pool, vp_pool_flags); > + vp2 = pool_get(&pmap_vp_pool, PR_NOWAIT | PR_ZERO); > if (vp2 == NULL) { > if ((flags & PMAP_CANFAIL) == 0) > panic("%s: unable to allocate L2", __func__); > @@ -358,7 +351,7 @@ pmap_vp_enter(pmap_t pm, vaddr_t va, str > > vp3 = vp2->vp[VP_IDX2(va)]; > if (vp3 == NULL) { > - vp3 = pool_get(&pmap_vp_pool, vp_pool_flags); > + vp3 = pool_get(&pmap_vp_pool, PR_NOWAIT | PR_ZERO); > if (vp3 == NULL) { > if ((flags & PMAP_CANFAIL) == 0) > panic("%s: unable to allocate L3", __func__); > Dale Rahn dr...@dalerahn.com