I've been adding KTR debugging to try and track down the cause of this recurring problem (FYI: debug.mpsafenet=0 is no longer working around it). To refresh your memory, here is the panic:
db> wh Tracing pid 24 tid 100012 td 0xfffff802be9fa560 panic() at panic+0x164 rtfree() at rtfree+0xb4 nd6_na_output() at nd6_na_output+0x540 nd6_ns_input() at nd6_ns_input+0x738 icmp6_input() at icmp6_input+0xc38 ip6_input() at ip6_input+0x1038 netisr_processqueue() at netisr_processqueue+0x7c swi_net() at swi_net+0xdc ithread_execute_handlers() at ithread_execute_handlers+0x144 ithread_loop() at ithread_loop+0xa4 fork_exit() at fork_exit+0x94 fork_trampoline() at fork_trampoline+0x8 db> It's always in nd6_na_output() although the trace beyond this point varies. However that doesn't tell us what leaked the reference count prior to this stack trace. So far I have narrowed it down to: db> show ktr/v Timestamp --v 9320 (0xfffff802be9fa560:cpu5) 1815572139270 net/route.c.247: Removing ref -> 0 0xfffff80227cefc20 ^-- This is the cause of the panic in rtfree(), since it tries to decrement from 0. 9319 (0xfffff802be9fa560:cpu5) 1815572138338 netinet6/nd6_nbr.c.1028: Freeing route 0xfffff80227cefc20 with ref 0 ^-- This is the call to rtfree() above, which is here at the end of nd6_na_output(): if (ro.ro_rt) { /* we don't cache this route. */ RTFREE(ro.ro_rt); } return; 9318 (0xfffff802be9fa560:cpu5) 1815572070306 net/route.c.247: Removing ref -> 1 0xfffff80227cefc20 This is the previous time rtfree() was run 9317 (0xfffff802be9fa560:cpu5) 1815572068930 netinet6/in6_src.c.703: rtfree 0xfffff80227cefc20 ^-- this is the call to rtfree in 9318, which is at the end of in6_selectif() if (rt && rt == sro.ro_rt) RTFREE(rt); return (0); My next step is to add KTR logging to all the callers of in6_selectif() to backtrace another level, but perhaps someone has ideas what can be going wrong from the partial trace already. 9316 (0xfffff802be9fa560:cpu5) 1815572067244 net/route.c.198: Adding ref -> 0 0xfffff80227cefc20 This is in rtalloc1(): } else { KASSERT(rt == newrt, ("locking wrong route")); RT_LOCK(newrt); RT_ADDREF(newrt); I suppose I need to also add KTR logging to the callers of rtalloc1(). 9315 (0xfffff802be9fa560:cpu5) 1815572057262 netinet6/nd6.c.877: Removing ref -> 1 0xfffff80227cefc20 This is in nd6_lookup(): } RT_LOCK_ASSERT(rt); RT_REMREF(rt); /* * Validation for the entry. * Note that the check for rt_llinfo is necessary because a cloned * route from a parent route that has the L flag (e.g. the default NB: The RT_LOCK_ASSERT() is superfluous here since RT_REMREF() already asserts it. 9314 (0xfffff802be9fa560:cpu5) 1815572046008 net/route.c.198: Adding ref -> 0 0xfffff80227cefc20 Kris P.S. This comment in netinet6/ip6_output.c appears to be bogus, since RTFREE is only a single statement: if (ro == &ip6route && ro->ro_rt) { /* brace necessary for RTFREE */ RTFREE(ro->ro_rt); } else if (ro_pmtu == &ip6route && ro_pmtu->ro_rt) { RTFREE(ro_pmtu->ro_rt); }
pgpIeuGOzrPGY.pgp
Description: PGP signature