On Mon, May 21, 2012 at 11:05 PM, Rafael Zalamena <rzalam...@gmail.com> wrote: > On Mon, May 21, 2012 at 5:16 PM, Claudio Jeker <cje...@diehard.n-r-g.com> wrote: >> On Thu, May 10, 2012 at 08:19:58PM -0300, Rafael Zalamena wrote: >>> While I was configuring a new ALIX to my MPLS setup a panic ocurred >>> while starting LDPd daemon. >>> >>> Steps: >>> 1. Configure all interfaces using /etc/hostname.*, then run 'sh >>> /etc/netstart' >>> 2. Configure ospfd.conf, then start it: ospfd -dv & >>> 3. Configure ldpd.conf, then start it: ldpd -dv >>> 4. Panic >>> >>> I'll send the ospfd.conf and ldpd.conf next mail. I'm using OpenBSD >>> 5.1-release on all 3 ALIX now, it happened while I was setting up the >>> last ALIX connected to the other two. >>> >>> p.s. note the scrambled print output of LDPd before dying. >>> >> >> >>> Panic log >>> === >>> # ldpd -dv >>> startup >>> kernel add routeuvm_fault(0xd54e5bf4, 0x0, 0, 1) -> e >>> 0.0.0.0/0 >>> kernkel add route 10.e0.3.0/24 >>> kernelr add route 10.0.n4.0/24 >>> kernel aedd route 10.0.10l.3/32 >>> kernel ad:d route 192.168. 3.0/24 >>> page fault trap, code=0 >>> Stopped at ifaof_ifpforaddr+0x26: movl 0x14(%edx),%edx >> >>> ddb> trace >>> ifaof_ifpforaddr(d11884d8,0,0,d03e6afd,d09e1220) at ifaof_ifpforaddr+0x26 >>> ifa_ifwithroute(140003,d11884d8,d11884e8,0,d09e1220) at ifa_ifwithroute+0x61 >>> rt_getifa(d8c9acfc,0,d1188a0c,2,0) at rt_getifa+0xe2 >>> rtrequest1(1,d8c9acfc,8,d8c9ad54,0) at rtrequest1+0x5f7 >>> route_output(d54ebb00,d5358008,d54ebb00,0,0) at route_output+0xe29 >>> route_usrreq(d5358008,9,d54ebb00,0,0) at route_usrreq+0x65 >>> sosend(d5358008,0,d8c9aec0,d54ebb00,0) at sosend+0x456 >>> soo_write(d54d2370,d54d238c,d8c9aec0,d54f23c0,d54e44c8) at soo_write+0x3b >>> dofilewritev(d54df680,4,d54d2370,cfbf3f40,3) at dofilewritev+0x131 >>> sys_writev(d54df680,d8c9af64,d8c9af84,d0576b0a,d54df680) at sys_writev+0x7c >>> syscall() at syscall+0x26a >>> --- syscall (number 0) --- >>> 0x2: >>> ddb> >> >> The ifp passed to ifaof_ifpforaddr() is NULL. How that can happen is >> unclear to me, it seems like the found ifa is not valid anymore. >> Is this crash easy to trigger? Can I get you're hostname.* files, >> ospfd.conf and ldpd.conf for all three boxes? >> > > ALIX3: (this one panic'ed) > ==> /etc/hostname.lo1 > 10.0.10.3/32 > ==> /etc/hostname.mpe0 > mplslabel 666 > 192.168.3.200/32 > ==> /etc/hostname.vr0 > 192.168.3.200/24 > ==> /etc/hostname.vr1 > 10.0.4.2/24 mpls > ==> /etc/hostname.vr2 > 10.0.3.1/24 mpls > ==> /etc/ospfd.conf > router-id 10.0.10.3 > > area 0.0.0.0 { > interface vr0 > interface vr1 > interface vr2 > interface lo1 > } > ==> /etc/ldpd.conf > router-id 10.0.10.3 > > interface vr1 > interface vr2 > > > ALIX2: > ==> /etc/hostname.lo1 > 10.0.10.2/32 > ==> /etc/hostname.vr1 > 10.0.3.2/24 mpls > ==> /etc/hostname.vr2 > 10.0.1.2/24 mpls > ==> /etc/ospfd.conf > router-id 10.0.10.2 > > area 0.0.0.0 { > interface vr1 > interface vr2 > interface lo1 > } > ==> /etc/ldpd.conf > router-id 10.0.10.2 > > interface vr1 > interface vr2 > > > ALIX1: > ==> /etc/hostname.lo1 > 10.0.10.1/32 > ==> /etc/hostname.mpe0 > mplslabel 666 > 192.168.1.200/32 > ==> /etc/hostname.vr0 > 192.168.1.200/24 > !route add default 192.168.1.254 > ==> /etc/hostname.vr1 > 10.0.1.1/24 mpls > ==> /etc/hostname.vr2 > 10.0.2.1/24 mpls > ==> /etc/ospfd.conf > router-id 10.0.10.1 > > area 0.0.0.0 { > interface vr0 > interface vr1 > interface vr2 > interface lo1 > } > ==> /etc/ldpd.conf > router-id 10.0.10.1 > > interface vr1 > interface vr2 > > > The setup topology is: http://dl.dropbox.com/u/222135/partial.png > For more information about the setup, please see the "MPLS Setup" thread I made. > > Steps to reproduce: > 1 - Configure ALIX1 interfaces, ospf, ldpd > 2 - Start interfaces and then daemons (ospf first) > 3 - Repeate for 2 and 3. > 4 - While repeating the process for ALIX3 it panics. > > ALIX 3 crashed while starting LDPd with the others running (maybe its > a event storm thing?). I might have forgotten something, but once > everything is placed it doesn't happen anymore, so we can try to > reproduce it by reconfiguring one of the hosts while the others one > are working. > > Configuration showing script: > for i in `ls -1 /etc/hostname.*`; do \ > echo "==> $i"; \ > cat $i; \ > done; \ > echo "==> /etc/ospfd.conf"; \ > cat /etc/ospfd.conf; \ > echo "==> /etc/ldpd.conf"; \ > cat /etc/ldpd.conf;
OK, after just a little bit of thinkering I've got something. After booting up ALIX1, I played some commands and here is what I've got. # ifconfig vr0 alias delete # pkill ldpd # ldpd -dv & [1] 1730 # startup ]accept_add: acceuvm_fault(0xd54eb880, 0x0, 0, 1) -> e pting on fd 11 kaccept_add: acceepting on fd 9 irf_act_start: intnerface vr2 link edown if_fsm: evlent UP resulted :in action START and changing stapte for interfacea vr2 from DOWN tgo ACTIVE if_fsme: event UP resul ted in action STfART and changinga state for interuface vr1 from DOlWN to ACTIVE ketrnel add route 0 .0.0.0/0 kernelt add route 10.0.r1.0/24 kernel aadd route 10.0.1.p0/24 kernel add, route 10.0.2.0/ 24 kernel add rcoute 10.0.3.0/24o eernel add roudte 10.0.10.1/32 kernel add rout=e 10.0.10.2/32 0kernel add route 10.0.10.3/32 Stopped at ifaof_ifpforaddr+0x26: movl 0x14(%edx),%edx ddb> ps PID PPID PGRP UID S FLAGS WAIT COMMAND 6095 1730 1730 98 3 0x80 kqread ldpd 2761 1730 1730 98 3 0x80 kqread ldpd * 1730 11124 1730 0 7 0 ldpd 9946 26755 26755 0 3 0x88 pause sendmail 26755 4320 26755 0 3 0x80 select sendmail 11124 1 11124 0 3 0x80 ttyin ksh 18447 1 18447 0 3 0x80 select cron 26945 1 26945 99 3 0x80 poll sndiod 13366 1 13366 0 3 0x80 select inetd 4320 1 4933 0 3 0x88 pause sendmail 29378 13835 13835 85 3 0x80 kqread ospfd 2733 13835 13835 85 3 0x80 kqread ospfd 13835 1 13835 0 3 0x80 kqread ospfd 24601 1 24601 0 3 0x80 select sshd 21983 2988 2988 74 3 0x80 bpf pflogd 2988 1 2988 0 3 0x80 netio pflogd 18275 13196 13196 73 2 0x80 syslogd 13196 1 13196 0 3 0x80 netio syslogd 7697 1 7697 0 3 0x80 mfsidl mount_mfs 21567 1 21567 0 3 0x80 mfsidl mount_mfs 23010 1 23010 0 3 0x80 mfsidl mount_mfs 13 0 0 0 3 0x100200 aiodoned aiodoned 12 0 0 0 3 0x100200 syncer update 11 0 0 0 3 0x100200 cleaner cleaner 10 0 0 0 3 0x100200 reaper reaper 9 0 0 0 3 0x100200 pgdaemon pagedaemon 8 0 0 0 3 0x100200 bored crypto 7 0 0 0 3 0x100200 pftm pfpurge 6 0 0 0 3 0x100200 usbtsk usbtask 5 0 0 0 3 0x100200 usbatsk usbatsk 4 0 0 0 3 0x100200 bored syswq 3 0 0 0 3 0x40100200 idle0 2 0 0 0 3 0x100200 kmalloc kmthread 1 0 1 0 3 0x80 wait init 0 -1 0 0 3 0x200 scheduler swapper ddb> trace ifaof_ifpforaddr(d11effd8,0,0,d0519707,d11ef000) at ifaof_ifpforaddr+0x26 ifa_ifwithroute(140003,d11effd8,d11effe8,0,f37bec00) at ifa_ifwithroute+0x61 rt_getifa(f37becfc,0,f37bec8c,d03dacfc,40) at rt_getifa+0xe2 rtrequest1(1,f37becfc,8,f37bed54,0) at rtrequest1+0x5f7 route_output(d5508700,d523c444,d5508700,0,0) at route_output+0xe38 route_usrreq(d523c444,9,d5508700,0,0) at route_usrreq+0x65 sosend(d523c444,0,f37beec0,d5508700,0) at sosend+0x476 soo_write(d52371bc,d52371d8,f37beec0,d54fa5a0,cfcf0014) at soo_write+0x3b dofilewritev(d526f45c,4,d52371bc,cfbcfbc0,3) at dofilewritev+0x131 sys_writev(d526f45c,f37bef64,f37bef84,d057b7da,d526f45c) at sys_writev+0x7c syscall() at syscall+0x26a --- syscall (number 0) --- 0x2: ddb>