Dear Olivier, I decided to make a fresh install again and test. After enabling all neighbors zebra caused page fault and this time dump is available:
# kgdb debug/boot/kernel/kernel.debug vmcore.0 GNU gdb (GDB) 8.2.1 [GDB v8.2.1 for FreeBSD] ....... ...... Reading symbols from debug/boot/kernel/kernel.debug...done. Unread portion of the kernel message buffer: <3>rn_delete: couldn't find our annotation Fatal trap 12: page fault while in kernel mode cpuid = 2; apic id = 04 fault virtual address = 0x70 fault code = supervisor read data , page not present instruction pointer = 0x20:0xffffffff80ac8bd5 stack pointer = 0x28:0xfffffe0089c78550 frame pointer = 0x28:0xfffffe0089c78650 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 74488 (zebra) trap number = 12 panic: page fault cpuid = 2 time = 1556012813 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0089c781f0 vpanic() at vpanic+0x1b4/frame 0xfffffe0089c78250 panic() at panic+0x43/frame 0xfffffe0089c782b0 trap_fatal() at trap_fatal+0x394/frame 0xfffffe0089c78310 trap_pfault() at trap_pfault+0x49/frame 0xfffffe0089c78370 trap() at trap+0x29f/frame 0xfffffe0089c78480 calltrap() at calltrap+0x8/frame 0xfffffe0089c78480 --- trap 0xc, rip = 0xffffffff80ac8bd5, rsp = 0xfffffe0089c78550, rbp = 0xfffffe0089c78650 --- rtrequest1_fib() at rtrequest1_fib+0x2b5/frame 0xfffffe0089c78650 route_output() at route_output+0xc7a/frame 0xfffffe0089c788d0 sosend_generic() at sosend_generic+0x51a/frame 0xfffffe0089c78980 sosend() at sosend+0x50/frame 0xfffffe0089c789b0 soo_write() at soo_write+0x32/frame 0xfffffe0089c789f0 dofilewrite() at dofilewrite+0x78/frame 0xfffffe0089c78a40 sys_write() at sys_write+0xc3/frame 0xfffffe0089c78ab0 amd64_syscall() at amd64_syscall+0x32f/frame 0xfffffe0089c78bf0 fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe0089c78bf0 --- syscall (4, FreeBSD ELF64, sys_write), rip = 0x8005bc22a, rsp = 0x7fffffffe1e8, rbp = 0x7fffffffe220 --- KDB: enter: panic Uptime: 1d0h23m12s Dumping 750 out of 16337 MB:..3%..11%..22%..32%..41%..52%..62%..71%..82%..92% __curthread () at ./machine/pcpu.h:230 230 ./machine/pcpu.h: No such file or directory. (kgdb) backtrace #0 __curthread () at ./machine/pcpu.h:230 #1 doadump (textdump=<optimized out>) at /usr/local/BSDRP/BSDRPstable/FreeBSD/src/sys/kern/kern_shutdown.c:371 #2 0xffffffff809a61b0 in kern_reboot (howto=260) at /usr/local/BSDRP/BSDRPstable/FreeBSD/src/sys/kern/kern_shutdown.c:451 #3 0xffffffff809a6650 in vpanic (fmt=<optimized out>, ap=0xfffffe0089c78290) at /usr/local/BSDRP/BSDRPstable/FreeBSD/src/sys/kern/kern_shutdown.c:877 #4 0xffffffff809a6433 in panic (fmt=<unavailable>) at /usr/local/BSDRP/BSDRPstable/FreeBSD/src/sys/kern/kern_shutdown.c:804 #5 0xffffffff80de3a84 in trap_fatal (frame=0xfffffe0089c78490, eva=112) at /usr/local/BSDRP/BSDRPstable/FreeBSD/src/sys/amd64/amd64/trap.c:946 #6 0xffffffff80de3ae9 in trap_pfault (frame=0xfffffe0089c78490, usermode=0) at /usr/local/BSDRP/BSDRPstable/FreeBSD/src/sys/amd64/amd64/trap.c:765 #7 0xffffffff80de30ef in trap (frame=0xfffffe0089c78490) at /usr/local/BSDRP/BSDRPstable/FreeBSD/src/sys/amd64/amd64/trap.c:441 #8 <signal handler called> #9 rt_notifydelete (rt=0x0, info=<optimized out>) at /usr/local/BSDRP/BSDRPstable/FreeBSD/src/sys/net/route.c:1251 #10 rtrequest1_fib (req=<optimized out>, info=0xfffffe0089c78700, ret_nrt=0xfffffe0089c787b8, fibnum=<optimized out>) at /usr/local/BSDRP/BSDRPstable/FreeBSD/src/sys/net/route.c:1566 #11 0xffffffff80ace58a in route_output (m=<optimized out>, so=<optimized out>) at /usr/local/BSDRP/BSDRPstable/FreeBSD/src/sys/net/rtsock.c:723 #12 0xffffffff80a3aa6a in sosend_generic (so=0xfffff80006a70000, addr=0x0, uio=0xfffffe0089c78a50, top=0xfffff8034cd20d00, control=0x0, flags=<optimized out>, td=0xfffff80081451580) at /usr/local/BSDRP/BSDRPstable/FreeBSD/src/sys/kern/uipc_socket.c:1582 #13 0xffffffff80a3ae50 in sosend (so=0xfffff80004f38138, addr=0x0, uio=0xfffffe0089c783c0, top=0x1, control=0x0, flags=-1983413312, td=0xfffff80081451580) at /usr/local/BSDRP/BSDRPstable/FreeBSD/src/sys/kern/uipc_socket.c:1628 #14 0xffffffff80a16772 in soo_write (fp=<optimized out>, uio=0xfffffe0089c78a50, active_cred=<optimized out>, flags=<optimized out>, td=<optimized out>) at /usr/local/BSDRP/BSDRPstable/FreeBSD/src/sys/kern/sys_socket.c:148 #15 0xffffffff80a0e428 in fo_write (fp=<optimized out>, uio=<optimized out>, active_cred=0xfffffe0089c783c0, flags=<optimized out>, td=<optimized out>) at /usr/local/BSDRP/BSDRPstable/FreeBSD/src/sys/sys/file.h:314 #16 dofilewrite (td=0xfffff80081451580, fd=<optimized out>, fp=0xfffff80006a7d370, auio=0xfffffe0089c78a50, offset=<optimized out>, flags=0) at /usr/local/BSDRP/BSDRPstable/FreeBSD/src/sys/kern/sys_generic.c:567 #17 0xffffffff80a0e053 in kern_writev (td=<optimized out>, fd=7, auio=<optimized out>) at /usr/local/BSDRP/BSDRPstable/FreeBSD/src/sys/kern/sys_generic.c:491 #18 sys_write (td=0xfffff80081451580, uap=<optimized out>) at /usr/local/BSDRP/BSDRPstable/FreeBSD/src/sys/kern/sys_generic.c:406 #19 0xffffffff80de461f in syscallenter (td=<optimized out>) at /usr/local/BSDRP/BSDRPstable/FreeBSD/src/sys/amd64/amd64/../../kern/subr_syscall.c:135 #20 amd64_syscall (td=0xfffff80081451580, traced=0) at /usr/local/BSDRP/BSDRPstable/FreeBSD/src/sys/amd64/amd64/trap.c:1171 #21 <signal handler called> #22 0x00000008005bc22a in ?? () Backtrace stopped: Cannot access memory at address 0x7fffffffe1e8 The uptime that you see is more than 1 day, as there were no BGP sessions enabled. Just to mention that I am using kgdb on onather machine. Anything else that I can test? Regards, Lyubo On Mon, 22 Apr 2019 at 13:15, Lyubomir Yotov <l.yo...@gmail.com> wrote: > Hi Olivier, > > I did some tests and it appears that the system crashes after I enable > more than one neighbour with big amount of prefixes. > I have enabled the dump but it seems that the crash freezes the system > before it dumps. Now after I restart there is nothing in /data/crash (I use > the same setup for dumps as it is described on bsdrp.net). > I tried without enabling the dump to a device and still the crash freezes > the system. > > Any ideas? > > Regards, > > Lyubo > > > On Thu, 11 Apr 2019 at 12:37, Lyubomir Yotov <l.yo...@gmail.com> wrote: > >> Hi Olivier, >> Sorry for the late reply but I was busy these days. >> I went into several issues when disabling the sessions: >> --- >> router.bsdrp.net(config-router)# neighbor x.x.x.1 shutdown >> router.bsdrp.net(config-router)# neighbor x.x.x.2 shutdown >> router.bsdrp.net(config-router)# neighbor x.x.x.3 shutdown >> router.bsdrp.net(config-router)# neighbor x.x.x.4 shutdown >> router.bsdrp.net(config-router)# neighbor x.x.x.5 shutdown >> router.bsdrp.net(config-router)# >> router.bsdrp.net(config-router)# >> router.bsdrp.net(config-router)# ^ZWarning: closing connection to zebra >> because of an I/O error! >> Warning: connecting to zebra...failed! >> >> user@router/#service frr status >> zebra is not running. >> ospfd is running as pid 1586. >> ospf6d is running as pid 3340. >> bgpd is running as pid 6100. >> --- >> Here are the logs (please disregard the date and time). >> - from /var/log/messages: >> --- >> Aug 16 18:07:25 router kernel: pid 44117 (zebra), jid 0, uid 168: exited >> on signal 6 >> --- >> - from /var/log/bgpd.log >> --- >> 2008/08/16 18:07:25 ZEBRA: Kernel: message seq 1632715 >> 2008/08/16 18:07:25 ZEBRA: Kernel: pid 44117, rtm_addrs 0x7 >> 2008/08/16 18:07:25 ZEBRA: rtm_read: got rtm of type 2 (RTM_DELETE) >> 2008/08/16 18:07:25 ZEBRA: Kernel: Len: 200 Type: RTM_DELETE >> 2008/08/16 18:07:25 ZEBRA: Kernel: GATEWAY DONE PROTO1 >> 2008/08/16 18:07:25 ZEBRA: Kernel: message seq 1634039 >> 2008/08/16 18:07:25 ZEBRA: Kernel: pid 44117, rtm_addrs 0x7 >> 2008/08/16 18:07:25 ZEBRA: rtm_read: got rtm of type 2 (RTM_DELETE) >> 2008/08/16 18:07:25 BGP: buffer_write: write error on fd 10: Broken pipe >> 2008/08/16 18:07:25 BGP: zclient_send_message: buffer_write failed to >> zclient fd 10, closing >> --- >> I decided to change the date and check again and then I went into >> something else. After enabling the sessions and checking for neighbor >> statistics I decided to clear one of the sessions that was not starting: >> --- >> router.bsdrp.net# clear ip bgp x.xSegmentation fault >> user has logged on pts/0 from y.y.y.y >> --- >> I used the "tab" key to complete the ip address in the above command. >> In /var/log/messages I get: >> --- >> Apr 9 10:49:56 router kernel: pid 14177 (zebra), jid 0, uid 168: exited >> on signal 6 >> Apr 9 10:50:35 router kernel: pid 54325 (vtysh), jid 0, uid 0: exited on >> signal 11 >> --- >> This time there is nothing in /var/log/bgpd.log. >> --- >> user@router/# service frr statusservice frr status >> zebra is not running. >> ospfd is running as pid 1586. >> ospf6d is running as pid 3340. >> bgpd is running as pid 6100. >> --- >> It could be a coincidence that vtysh has failed together with zebrad >> (could be that the session has established at the moment I have pressed the >> 'tab' key). >> >> I will try once again but this time will enabled some more debuging: >> --- >> router.bsdrp.net# show debugging >> Zebra debugging status: >> Zebra event debugging is on >> Zebra packet debugging is on >> Zebra kernel debugging is on >> Zebra RIB debugging is on >> >> OSPF debugging status: >> >> >> OSPF6 debugging status: >> >> BGP debugging status: >> BGP zebra debugging is on >> --- >> >> I will write hopefully tomorrow. >> >> Regards, >> >> Lyubo >> >> On Mon, 8 Apr 2019 at 20:39, Lyubomir Yotov <l.yo...@gmail.com> wrote: >> >>> Hi Olivier, >>> I just couldn't think of anything else (except changing frr with bird). >>> You are absolutely right about the panic (as well as for the zebra >>> daemon crash). This should not happen, regardles the wrong configuration. >>> The system is still working as expected (no traffic going in or out, as >>> it is a test system, but zebrad has not crashed so far). Tomorrow I will >>> check again with the original image and report back >>> . >>> Regards, >>> >>> Lyubomir >>> >>> On Mon, 8 Apr 2019 at 17:52, Olivier Cochard-Labbé <oliv...@cochard.me> >>> wrote: >>> >>>> On Mon, Apr 8, 2019 at 3:37 PM Lyubomir Yotov <l.yo...@gmail.com> >>>> wrote: >>>> >>>>> I actually found an error in the AS number (last digit was missing). >>>>> So far (more than an hour) it seems fine. As it has happened before >>>>> that it could crash after several hours I will wait until tomorrow. >>>>> If everything is fine I will try again with a fresh install (on >>>>> another flash drive) to check if the wrong config was the problem. >>>>> Just for the record here is the "show version" from the installed frr6: >>>>> router.bsdrp.net#show version >>>>> FRRouting 6.0.2 (router.bsdrp.net). >>>>> >>>>> >>>> Hi, >>>> Like you've seen, the binary version is stil 6.0.2, because the port >>>> revision "2" didn't modified the binary: >>>> - frr6.0.2_1 : It was just an update of an RC script, so no change into >>>> FRR binaries >>>> - frr6.0.2_2 : It was just a typo in RC script's comment >>>> So I don't think the resolution came from this upgrade. >>>> >>>> But in any case, even if the panic came from the wrong AS number: it >>>> should have not triggered a panic. >>>> >>>> Regards, >>>> >>>> Olivier >>>> _______________________________________________ >>>> Bsdrp-users mailing list >>>> Bsdrp-users@lists.sourceforge.net >>>> https://lists.sourceforge.net/lists/listinfo/bsdrp-users >>>> >>>
_______________________________________________ Bsdrp-users mailing list Bsdrp-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bsdrp-users