Dear Olivier,
I decided to make a fresh install again and test. After enabling all
neighbors zebra caused page fault and this time dump is available:

# kgdb debug/boot/kernel/kernel.debug vmcore.0
GNU gdb (GDB) 8.2.1 [GDB v8.2.1 for FreeBSD]
.......
......
Reading symbols from debug/boot/kernel/kernel.debug...done.

Unread portion of the kernel message buffer:
<3>rn_delete: couldn't find our annotation

Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 04
fault virtual address    = 0x70
fault code        = supervisor read data  , page not present
instruction pointer    = 0x20:0xffffffff80ac8bd5
stack pointer            = 0x28:0xfffffe0089c78550
frame pointer            = 0x28:0xfffffe0089c78650
code segment        = base 0x0, limit 0xfffff, type 0x1b
            = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags    = interrupt enabled, resume, IOPL = 0
current process        = 74488 (zebra)
trap number        = 12
panic: page fault
cpuid = 2
time = 1556012813
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
0xfffffe0089c781f0
vpanic() at vpanic+0x1b4/frame 0xfffffe0089c78250
panic() at panic+0x43/frame 0xfffffe0089c782b0
trap_fatal() at trap_fatal+0x394/frame 0xfffffe0089c78310
trap_pfault() at trap_pfault+0x49/frame 0xfffffe0089c78370
trap() at trap+0x29f/frame 0xfffffe0089c78480
calltrap() at calltrap+0x8/frame 0xfffffe0089c78480
--- trap 0xc, rip = 0xffffffff80ac8bd5, rsp = 0xfffffe0089c78550, rbp =
0xfffffe0089c78650 ---
rtrequest1_fib() at rtrequest1_fib+0x2b5/frame 0xfffffe0089c78650
route_output() at route_output+0xc7a/frame 0xfffffe0089c788d0
sosend_generic() at sosend_generic+0x51a/frame 0xfffffe0089c78980
sosend() at sosend+0x50/frame 0xfffffe0089c789b0
soo_write() at soo_write+0x32/frame 0xfffffe0089c789f0
dofilewrite() at dofilewrite+0x78/frame 0xfffffe0089c78a40
sys_write() at sys_write+0xc3/frame 0xfffffe0089c78ab0
amd64_syscall() at amd64_syscall+0x32f/frame 0xfffffe0089c78bf0
fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe0089c78bf0
--- syscall (4, FreeBSD ELF64, sys_write), rip = 0x8005bc22a, rsp =
0x7fffffffe1e8, rbp = 0x7fffffffe220 ---
KDB: enter: panic
Uptime: 1d0h23m12s
Dumping 750 out of 16337
MB:..3%..11%..22%..32%..41%..52%..62%..71%..82%..92%

__curthread () at ./machine/pcpu.h:230
230    ./machine/pcpu.h: No such file or directory.
(kgdb) backtrace
#0  __curthread () at ./machine/pcpu.h:230
#1  doadump (textdump=<optimized out>)
    at /usr/local/BSDRP/BSDRPstable/FreeBSD/src/sys/kern/kern_shutdown.c:371
#2  0xffffffff809a61b0 in kern_reboot (howto=260)
    at /usr/local/BSDRP/BSDRPstable/FreeBSD/src/sys/kern/kern_shutdown.c:451
#3  0xffffffff809a6650 in vpanic (fmt=<optimized out>,
ap=0xfffffe0089c78290)
    at /usr/local/BSDRP/BSDRPstable/FreeBSD/src/sys/kern/kern_shutdown.c:877
#4  0xffffffff809a6433 in panic (fmt=<unavailable>)
    at /usr/local/BSDRP/BSDRPstable/FreeBSD/src/sys/kern/kern_shutdown.c:804
#5  0xffffffff80de3a84 in trap_fatal (frame=0xfffffe0089c78490, eva=112)
    at /usr/local/BSDRP/BSDRPstable/FreeBSD/src/sys/amd64/amd64/trap.c:946
#6  0xffffffff80de3ae9 in trap_pfault (frame=0xfffffe0089c78490, usermode=0)
    at /usr/local/BSDRP/BSDRPstable/FreeBSD/src/sys/amd64/amd64/trap.c:765
#7  0xffffffff80de30ef in trap (frame=0xfffffe0089c78490)
    at /usr/local/BSDRP/BSDRPstable/FreeBSD/src/sys/amd64/amd64/trap.c:441
#8  <signal handler called>
#9  rt_notifydelete (rt=0x0, info=<optimized out>)
    at /usr/local/BSDRP/BSDRPstable/FreeBSD/src/sys/net/route.c:1251
#10 rtrequest1_fib (req=<optimized out>, info=0xfffffe0089c78700,
    ret_nrt=0xfffffe0089c787b8, fibnum=<optimized out>)
    at /usr/local/BSDRP/BSDRPstable/FreeBSD/src/sys/net/route.c:1566
#11 0xffffffff80ace58a in route_output (m=<optimized out>, so=<optimized
out>)
    at /usr/local/BSDRP/BSDRPstable/FreeBSD/src/sys/net/rtsock.c:723
#12 0xffffffff80a3aa6a in sosend_generic (so=0xfffff80006a70000, addr=0x0,
uio=0xfffffe0089c78a50, top=0xfffff8034cd20d00, control=0x0,
flags=<optimized out>, td=0xfffff80081451580) at
/usr/local/BSDRP/BSDRPstable/FreeBSD/src/sys/kern/uipc_socket.c:1582
#13 0xffffffff80a3ae50 in sosend (so=0xfffff80004f38138, addr=0x0,
uio=0xfffffe0089c783c0, top=0x1, control=0x0, flags=-1983413312,
td=0xfffff80081451580) at
/usr/local/BSDRP/BSDRPstable/FreeBSD/src/sys/kern/uipc_socket.c:1628
#14 0xffffffff80a16772 in soo_write (fp=<optimized out>,
uio=0xfffffe0089c78a50, active_cred=<optimized out>, flags=<optimized out>,
td=<optimized out>) at
/usr/local/BSDRP/BSDRPstable/FreeBSD/src/sys/kern/sys_socket.c:148
#15 0xffffffff80a0e428 in fo_write (fp=<optimized out>, uio=<optimized
out>, active_cred=0xfffffe0089c783c0, flags=<optimized out>, td=<optimized
out>) at /usr/local/BSDRP/BSDRPstable/FreeBSD/src/sys/sys/file.h:314
#16 dofilewrite (td=0xfffff80081451580, fd=<optimized out>,
fp=0xfffff80006a7d370, auio=0xfffffe0089c78a50, offset=<optimized out>,
flags=0) at
/usr/local/BSDRP/BSDRPstable/FreeBSD/src/sys/kern/sys_generic.c:567
#17 0xffffffff80a0e053 in kern_writev (td=<optimized out>, fd=7,
auio=<optimized out>) at
/usr/local/BSDRP/BSDRPstable/FreeBSD/src/sys/kern/sys_generic.c:491
#18 sys_write (td=0xfffff80081451580, uap=<optimized out>) at
/usr/local/BSDRP/BSDRPstable/FreeBSD/src/sys/kern/sys_generic.c:406
#19 0xffffffff80de461f in syscallenter (td=<optimized out>) at
/usr/local/BSDRP/BSDRPstable/FreeBSD/src/sys/amd64/amd64/../../kern/subr_syscall.c:135
#20 amd64_syscall (td=0xfffff80081451580, traced=0) at
/usr/local/BSDRP/BSDRPstable/FreeBSD/src/sys/amd64/amd64/trap.c:1171
#21 <signal handler called>
#22 0x00000008005bc22a in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fffffffe1e8

The uptime that you see is more than 1 day, as there were no BGP sessions
enabled.
Just to mention that I am using kgdb on onather machine.

Anything else that I can test?

Regards,

Lyubo

On Mon, 22 Apr 2019 at 13:15, Lyubomir Yotov <l.yo...@gmail.com> wrote:

> Hi Olivier,
>
> I did some tests and it appears that the system crashes after I enable
> more than one neighbour with big amount of prefixes.
> I have enabled the dump but it seems that the crash freezes the system
> before it dumps. Now after I restart there is nothing in /data/crash (I use
> the same setup for dumps as it is described on bsdrp.net).
> I tried without enabling the dump to a device and still the crash freezes
> the system.
>
> Any ideas?
>
> Regards,
>
> Lyubo
>
>
> On Thu, 11 Apr 2019 at 12:37, Lyubomir Yotov <l.yo...@gmail.com> wrote:
>
>> Hi Olivier,
>> Sorry for the late reply but I was busy these days.
>> I went into several issues when disabling the sessions:
>> ---
>> router.bsdrp.net(config-router)# neighbor x.x.x.1 shutdown
>> router.bsdrp.net(config-router)# neighbor x.x.x.2 shutdown
>> router.bsdrp.net(config-router)# neighbor x.x.x.3 shutdown
>> router.bsdrp.net(config-router)# neighbor x.x.x.4 shutdown
>> router.bsdrp.net(config-router)# neighbor x.x.x.5 shutdown
>> router.bsdrp.net(config-router)#
>> router.bsdrp.net(config-router)#
>> router.bsdrp.net(config-router)# ^ZWarning: closing connection to zebra
>> because of an I/O error!
>> Warning: connecting to zebra...failed!
>>
>> user@router/#service frr status
>> zebra is not running.
>> ospfd is running as pid 1586.
>> ospf6d is running as pid 3340.
>> bgpd is running as pid 6100.
>> ---
>> Here are the logs (please disregard the date and time).
>> - from /var/log/messages:
>> ---
>> Aug 16 18:07:25 router kernel: pid 44117 (zebra), jid 0, uid 168: exited
>> on signal 6
>> ---
>> - from /var/log/bgpd.log
>> ---
>> 2008/08/16 18:07:25 ZEBRA: Kernel: message seq 1632715
>> 2008/08/16 18:07:25 ZEBRA: Kernel: pid 44117, rtm_addrs 0x7
>> 2008/08/16 18:07:25 ZEBRA: rtm_read: got rtm of type 2 (RTM_DELETE)
>> 2008/08/16 18:07:25 ZEBRA: Kernel: Len: 200 Type: RTM_DELETE
>> 2008/08/16 18:07:25 ZEBRA: Kernel: GATEWAY DONE PROTO1
>> 2008/08/16 18:07:25 ZEBRA: Kernel: message seq 1634039
>> 2008/08/16 18:07:25 ZEBRA: Kernel: pid 44117, rtm_addrs 0x7
>> 2008/08/16 18:07:25 ZEBRA: rtm_read: got rtm of type 2 (RTM_DELETE)
>> 2008/08/16 18:07:25 BGP: buffer_write: write error on fd 10: Broken pipe
>> 2008/08/16 18:07:25 BGP: zclient_send_message: buffer_write failed to
>> zclient fd 10, closing
>> ---
>> I decided to change the date and check again and then I went into
>> something else. After enabling the sessions and checking for neighbor
>> statistics I decided to clear one of the sessions that was not starting:
>> ---
>> router.bsdrp.net# clear ip bgp x.xSegmentation fault
>> user has logged on pts/0 from y.y.y.y
>> ---
>> I used the "tab" key to complete the ip address in the above command.
>> In /var/log/messages I get:
>> ---
>> Apr  9 10:49:56 router kernel: pid 14177 (zebra), jid 0, uid 168: exited
>> on signal 6
>> Apr  9 10:50:35 router kernel: pid 54325 (vtysh), jid 0, uid 0: exited on
>> signal 11
>> ---
>> This time there is nothing in /var/log/bgpd.log.
>> ---
>> user@router/# service frr statusservice frr status
>> zebra is not running.
>> ospfd is running as pid 1586.
>> ospf6d is running as pid 3340.
>> bgpd is running as pid 6100.
>> ---
>> It could be a coincidence that vtysh has failed together with zebrad
>> (could be that the session has established at the moment I have pressed the
>> 'tab' key).
>>
>> I will try once again but this time will enabled some more debuging:
>> ---
>> router.bsdrp.net# show debugging
>> Zebra debugging status:
>>   Zebra event debugging is on
>>   Zebra packet debugging is on
>>   Zebra kernel debugging is on
>>   Zebra RIB debugging is on
>>
>> OSPF debugging status:
>>
>>
>> OSPF6 debugging status:
>>
>> BGP debugging status:
>>   BGP zebra debugging is on
>> ---
>>
>> I will write hopefully tomorrow.
>>
>> Regards,
>>
>> Lyubo
>>
>> On Mon, 8 Apr 2019 at 20:39, Lyubomir Yotov <l.yo...@gmail.com> wrote:
>>
>>> Hi Olivier,
>>> I just couldn't think of anything else (except changing frr with bird).
>>> You are absolutely right about the panic (as well as for the zebra
>>> daemon crash). This should not happen, regardles the wrong configuration.
>>> The system is still working as expected (no traffic going in or out, as
>>> it is a test system, but zebrad has not crashed so far). Tomorrow I will
>>> check again with the original image and report back
>>> .
>>> Regards,
>>>
>>> Lyubomir
>>>
>>> On Mon, 8 Apr 2019 at 17:52, Olivier Cochard-Labbé <oliv...@cochard.me>
>>> wrote:
>>>
>>>> On Mon, Apr 8, 2019 at 3:37 PM Lyubomir Yotov <l.yo...@gmail.com>
>>>> wrote:
>>>>
>>>>> I actually found an error in the AS number (last digit was missing).
>>>>> So far (more than an hour) it seems fine. As it has happened before
>>>>> that it could crash after several hours I will wait until tomorrow.
>>>>> If everything is fine I will try again with a fresh install (on
>>>>> another flash drive) to check if the wrong config was the problem.
>>>>> Just for the record here is the "show version" from the installed frr6:
>>>>> router.bsdrp.net#show version
>>>>> FRRouting 6.0.2 (router.bsdrp.net).
>>>>>
>>>>>
>>>> Hi,
>>>> Like you've seen, the binary version is stil 6.0.2, because the port
>>>> revision "2" didn't modified the binary:
>>>> - frr6.0.2_1 : It was just an update of an RC script, so no change into
>>>> FRR binaries
>>>> - frr6.0.2_2 : It was just a typo in RC script's comment
>>>> So I don't think the resolution came from this upgrade.
>>>>
>>>> But in any case, even if the panic came from the wrong AS number: it
>>>> should have not triggered a panic.
>>>>
>>>> Regards,
>>>>
>>>> Olivier
>>>> _______________________________________________
>>>> Bsdrp-users mailing list
>>>> Bsdrp-users@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/bsdrp-users
>>>>
>>>
_______________________________________________
Bsdrp-users mailing list
Bsdrp-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bsdrp-users

Reply via email to