On Mon, May 21, 2012 at 11:20 PM, Rafael Zalamena <rzalam...@gmail.com>
wrote:
> On Mon, May 21, 2012 at 11:05 PM, Rafael Zalamena <rzalam...@gmail.com>
wrote:
>> On Mon, May 21, 2012 at 5:16 PM, Claudio Jeker <cje...@diehard.n-r-g.com>
wrote:
>>> On Thu, May 10, 2012 at 08:19:58PM -0300, Rafael Zalamena wrote:
>>>> ...
>>> The ifp passed to ifaof_ifpforaddr() is NULL. How that can happen is
>>> unclear to me, it seems like the found ifa is not valid anymore.
>>> Is this crash easy to trigger? Can I get you're hostname.* files,
>>> ospfd.conf and ldpd.conf for all three boxes?
>>>
>> ...
>>
>>
>> ALIX1:
>> ==> /etc/hostname.lo1
>> 10.0.10.1/32
>> ==> /etc/hostname.mpe0
>> mplslabel 666
>> 192.168.1.200/32
>> ==> /etc/hostname.vr0
>> 192.168.1.200/24
>> !route add default 192.168.1.254
>> ==> /etc/hostname.vr1
>> 10.0.1.1/24 mpls
>> ==> /etc/hostname.vr2
>> 10.0.2.1/24 mpls
>> ==> /etc/ospfd.conf
>> router-id 10.0.10.1
>>
>> area 0.0.0.0 {
>>        interface vr0
>>        interface vr1
>>        interface vr2
>>        interface lo1
>> }
>> ==> /etc/ldpd.conf
>> router-id 10.0.10.1
>>
>> interface vr1
>> interface vr2
>>
>>
>> The setup topology is: http://dl.dropbox.com/u/222135/partial.png
>> For more information about the setup, please see the "MPLS Setup" thread I
made.
>>
>> Steps to reproduce:
>> 1 - Configure ALIX1 interfaces, ospf, ldpd
>> 2 - Start interfaces and then daemons (ospf first)
>> 3 - Repeate for 2 and 3.
>> 4 - While repeating the process for ALIX3 it panics.
>>
>> ALIX 3 crashed while starting LDPd with the others running (maybe its
>> a event storm thing?). I might have forgotten something, but once
>> everything is placed it doesn't happen anymore, so we can try to
>> reproduce it by reconfiguring one of the hosts while the others one
>> are working.
>>
>> ...
>
> OK, after just a little bit of thinkering I've got something.
>
> After booting up ALIX1, I played some commands and here is what I've got.
>
> # ifconfig vr0 alias delete
> # pkill ldpd
> # ldpd -dv &
> [1] 1730
> # startup
> ]accept_add: acceuvm_fault(0xd54eb880, 0x0, 0, 1) -> e
> pting on fd 11
> kaccept_add: acceepting on fd 9
> irf_act_start: intnerface vr2 link edown
> if_fsm: evlent UP resulted :in action START  and changing stapte for
> interfacea vr2 from DOWN tgo ACTIVE
> if_fsme: event UP resul ted in action STfART and changinga state for
> interuface vr1 from DOlWN to ACTIVE
> ketrnel add route 0 .0.0.0/0
> kernelt add route 10.0.r1.0/24
> kernel aadd route 10.0.1.p0/24
> kernel add, route 10.0.2.0/ 24
> kernel add rcoute 10.0.3.0/24o
> eernel add roudte 10.0.10.1/32
>  kernel add rout=e 10.0.10.2/32
> 0kernel add route
>                  10.0.10.3/32
> Stopped at      ifaof_ifpforaddr+0x26:  movl    0x14(%edx),%edx
> ddb> ps
>   PID   PPID   PGRP    UID  S       FLAGS  WAIT          COMMAND
>  6095   1730   1730     98  3        0x80  kqread        ldpd
>  2761   1730   1730     98  3        0x80  kqread        ldpd
> * 1730  11124   1730      0  7           0                ldpd
>  9946  26755  26755      0  3        0x88  pause         sendmail
>  26755   4320  26755      0  3        0x80  select        sendmail
>  11124      1  11124      0  3        0x80  ttyin         ksh
>  18447      1  18447      0  3        0x80  select        cron
>  26945      1  26945     99  3        0x80  poll          sndiod
>  13366      1  13366      0  3        0x80  select        inetd
>  4320      1   4933      0  3        0x88  pause         sendmail
>  29378  13835  13835     85  3        0x80  kqread        ospfd
>  2733  13835  13835     85  3        0x80  kqread        ospfd
>  13835      1  13835      0  3        0x80  kqread        ospfd
>  24601      1  24601      0  3        0x80  select        sshd
>  21983   2988   2988     74  3        0x80  bpf           pflogd
>  2988      1   2988      0  3        0x80  netio         pflogd
>  18275  13196  13196     73  2        0x80                syslogd
>  13196      1  13196      0  3        0x80  netio         syslogd
>  7697      1   7697      0  3        0x80  mfsidl        mount_mfs
>  21567      1  21567      0  3        0x80  mfsidl        mount_mfs
>  23010      1  23010      0  3        0x80  mfsidl        mount_mfs
>    13      0      0      0  3    0x100200  aiodoned      aiodoned
>    12      0      0      0  3    0x100200  syncer        update
>    11      0      0      0  3    0x100200  cleaner       cleaner
>    10      0      0      0  3    0x100200  reaper        reaper
>     9      0      0      0  3    0x100200  pgdaemon      pagedaemon
>     8      0      0      0  3    0x100200  bored         crypto
>     7      0      0      0  3    0x100200  pftm          pfpurge
>     6      0      0      0  3    0x100200  usbtsk        usbtask
>     5      0      0      0  3    0x100200  usbatsk       usbatsk
>     4      0      0      0  3    0x100200  bored         syswq
>     3      0      0      0  3  0x40100200                idle0
>     2      0      0      0  3    0x100200  kmalloc       kmthread
>     1      0      1      0  3        0x80  wait          init
>     0     -1      0      0  3       0x200  scheduler     swapper
> ddb> trace
> ifaof_ifpforaddr(d11effd8,0,0,d0519707,d11ef000) at ifaof_ifpforaddr+0x26
> ifa_ifwithroute(140003,d11effd8,d11effe8,0,f37bec00) at
ifa_ifwithroute+0x61
> rt_getifa(f37becfc,0,f37bec8c,d03dacfc,40) at rt_getifa+0xe2
> rtrequest1(1,f37becfc,8,f37bed54,0) at rtrequest1+0x5f7
> route_output(d5508700,d523c444,d5508700,0,0) at route_output+0xe38
> route_usrreq(d523c444,9,d5508700,0,0) at route_usrreq+0x65
> sosend(d523c444,0,f37beec0,d5508700,0) at sosend+0x476
> soo_write(d52371bc,d52371d8,f37beec0,d54fa5a0,cfcf0014) at soo_write+0x3b
> dofilewritev(d526f45c,4,d52371bc,cfbcfbc0,3) at dofilewritev+0x131
> sys_writev(d526f45c,f37bef64,f37bef84,d057b7da,d526f45c) at sys_writev+0x7c
> syscall() at syscall+0x26a
> --- syscall (number 0) ---
> 0x2:
> ddb>

Cleaned up the quotes from the e-mail to keep only whats necessary.

I've made a diff that solves the panic, but does not solve the main
problem. I investigated the problem with the time I had and I noticed
that it happened because I had routes referencing vr0 at the moment I
didn't have an alias configured for that interface.

The diff below avoids the panic by not letting the route address get
back to the interface that it belonged. However while it fix the
panics, it also causes LDPd to show an error treatment message telling
that something is wrong with the routes left pointing to vr0.


Index: sys/net/route.c
===================================================================
RCS file: /cvs/src/sys/net/route.c,v
retrieving revision 1.136
diff -u -p -r1.136 route.c
--- sys/net/route.c     9 May 2012 06:50:55 -0000       1.136
+++ sys/net/route.c     23 May 2012 12:12:01 -0000
@@ -646,6 +646,9 @@ ifa_ifwithroute(int flags, struct sockad
                if ((ifa = rt->rt_ifa) == NULL)
                        return (NULL);
        }
+       /* Don't search interfaces address if there is no pointer back  */
+       if (ifa->ifa_ifp == NULL)
+               return (NULL);
        if (ifa->ifa_addr->sa_family != dst->sa_family) {
                struct ifaddr   *oifa = ifa;
                ifa = ifaof_ifpforaddr(dst, ifa->ifa_ifp);

Reply via email to