On Mon, Mar 06, 2006 at 06:15:56PM -0500, Kris Kennaway wrote: > I've been adding KTR debugging to try and track down the cause of this > recurring problem (FYI: debug.mpsafenet=0 is no longer working around > it). To refresh your memory, here is the panic: > > db> wh > Tracing pid 24 tid 100012 td 0xfffff802be9fa560 > panic() at panic+0x164 > rtfree() at rtfree+0xb4 > nd6_na_output() at nd6_na_output+0x540 > nd6_ns_input() at nd6_ns_input+0x738 > icmp6_input() at icmp6_input+0xc38 > ip6_input() at ip6_input+0x1038 > netisr_processqueue() at netisr_processqueue+0x7c > swi_net() at swi_net+0xdc > ithread_execute_handlers() at ithread_execute_handlers+0x144 > ithread_loop() at ithread_loop+0xa4 > fork_exit() at fork_exit+0x94 > fork_trampoline() at fork_trampoline+0x8 > db> > > It's always in nd6_na_output() although the trace beyond this point > varies. However that doesn't tell us what leaked the reference count > prior to this stack trace. > > So far I have narrowed it down to:
Here is a better trace (in chronological order): 4431 (0xfffff803fe9f1ae0:cpu0) 16217304555013 netinet6/nd6_nbr.c.461: in6_selectsrc 0xe2e0b380 nd6_ns_output(): src = in6_selectsrc(&dst_sa, NULL, NULL, &ro, NULL, NULL, &error); 4432 (0xfffff803fe9f1ae0:cpu0) 16217304555999 netinet6/in6_src.c.241: in6_selectif 0xe2e0b380 in6_selectsrc(): /* * If the address is not specified, choose the best one based on * the outgoing interface and the destination address. */ /* get the outgoing interface */ if ((*errorp = in6_selectif(dstsock, opts, mopts, ro, &ifp)) != 0) return (NULL); in6_selectif() calls selectroute(): if ((error = selectroute(dstsock, opts, mopts, ro, retifp, &rt, 0, 1)) != 0) { 4433 (0xfffff803fe9f1ae0:cpu0) 16217304558555 net/route.c.198: Adding ref 0 0xfffff8032240dd10 4434 (0xfffff803fe9f1ae0:cpu0) 16217304559191 netinet6/in6_src.c.579: rtalloc1 0xfffff8032240dd10 This rtalloc1() was called from selectroute(): if (ro->ro_rt == (struct rtentry *)NULL) { struct sockaddr_in6 *sa6; /* No route yet, so try to acquire one */ bzero(&ro->ro_dst, sizeof(struct sockaddr_in6)); sa6 = (struct sockaddr_in6 *)&ro->ro_dst; *sa6 = *dstsock; sa6->sin6_scope_id = 0; if (clone) { rtalloc((struct route *)ro); } else { ro->ro_rt = rtalloc1(&((struct route *)ro) ->ro_dst, 0, 0UL); 4435 (0xfffff803fe9f1ae0:cpu0) 16217304560255 netinet6/in6_src.c.706: rtfree 0xfffff8032240dd10 4436 (0xfffff803fe9f1ae0:cpu0) 16217304560951 net/route.c.247: Removing ref 1 0xfffff8032240dd10 We are now back at the end of in6_selectif(): if (rt && rt == sro.ro_rt) RTFREE(rt); return (0); 4437 (0xfffff803fe9f1ae0:cpu0) 16217304590486 netinet6/nd6_nbr.c.534: 1 Freeing route 0xfffff8032240dd10 with ref 0 We are now back in nd6_ns_output() if (ro.ro_rt) { /* we don't cache this route. */ RTFREE(ro.ro_rt); } return; 4438 (0xfffff803fe9f1ae0:cpu0) 16217417726681 net/route.c.247: Removing ref 0 0xfffff8032240dd10 and explode because we've freed the same route twice in a row when it only had a refcount of 1 to begin with. I suspect the control flow in nd6_ns_output() is broken. Kris
pgpz0FRE2P4cw.pgp
Description: PGP signature