Re: BIRD Crashes

Barry O'Donovan (INEX) Thu, 18 Aug 2022 10:12:58 -0700

Hi Ian, all,

Ian Chilton wrote on 18/08/2022 16:57:

We then run a "bird re-validate" cron job every hour (at twenty past thehour):/usr/sbin/birdc -s /run/bird-ipv6.ctl reload in all > /dev/null ;/usr/sbin/birdc -s /run/bird-ipv4.ctl reload in all
Interestingly all 3 crashes have happened at just after twenty past thehour, i.e soon after this cron job has run.

As you're running Bird 2.0.8 this should be no longer necessary. Per2.0.8's release logs:


> Version 2.0.8 (2021-03-18)
>  o Automatic channel reloads based on RPKI changes

So given all three crashes appear linked to this, stopping those manualreloads should, hopefully, return you to stability.

You're also two bugfix releases behind. At INEX we've been running 2.0.9for ~5/6 months now without issue.

There appears to be a lot of bugfixes between 2.0.8 and 2.0.10 so itmight be worthwhile updating or checking the git commit logs to see ifthere's anything relevant to RPKI in there?


hth,
 - Barry

It looks like the following in the logs:
Aug 17 17:20:01 rs1 CRON[29229]: (root) CMD (/usr/sbin/birdc -s/run/bird-ipv6.ctl reload in all > /dev/null ; /usr/sbin/birdc -s/run/bird-ipv4.ctl reload in all > /dev/null)
Aug 17 17:20:01 rs1 bird: Reloading protocol device1
Aug 17 17:20:01 rs1 bird: Reloading protocol pp_0121_asxx
..etc..
Aug 17 17:20:01 rs1 bird: Reloading protocol pp_1082_asxxxxxx
Aug 17 17:20:01 rs1 bird: Reloading protocol pb_1082_asxxxxxx
Aug 17 17:20:01 rs1 bird: Tagging invalid ROA 2001:xxxx:xxxx::/48 forASN xxxxx
..etc..
Aug 17 17:21:17 rs1 bird: Tagging invalid ROA x.x.x.x/23 for ASN xxxx
Aug 17 17:21:19 rs1 kernel: [7811815.959943] bird[586]: segfault atf30021 ip 000055a1bf450fc3 sp 00007ffe64f3da98 error 4 inbird[55a1bf42a000+d8000]Aug 17 17:21:19 rs1 kernel: [7811815.966760] Code: 95 78 01 00 00 5b 5d41 5c c3 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 48 85 ff b8 01 00 00 0074 15 48 85 f6 0f 84 a6 00 00 00 <0f> b6 46 21 0f b6 57 21 29 d0 74 11f3 c3 0f 1f 44 00 00 66 2e 0fAug 17 17:21:19 rs1 systemd[1]: bird-ipv4.service: Main process exited,code=killed, status=11/SEGVAug 17 17:21:19 rs1 systemd[1]: bird-ipv4.service: Failed with result'signal'.Aug 17 17:21:19 rs1 systemd[1]: bird-ipv4.service: ServiceRestartSec=100ms expired, scheduling restart.Aug 17 17:21:19 rs1 systemd[1]: bird-ipv4.service: Scheduled restartjob, restart counter is at 1.
Aug 17 17:21:19 rs1 systemd[1]: Stopped BIRD - ipv4.
Aug 17 17:21:19 rs1 systemd[1]: Starting BIRD - ipv4...
Aug 17 17:21:22 rs1 systemd[1]: Started BIRD - ipv4.
Aug 17 17:21:22 rs1 bird: Started
When the second crash happened, we happened to be at RIPE84 so wechatted to Maria in person. She said that it was possible to debug it,but would need a core dump.
After looking in to this, I did:

ulimit -S -c unlimited
and installed the systemd-coredump package.
...which was supposed to dump a core file if a process crashed. I testedthis by killing a sleep command from the shell with kill -s 6 and it worked.
When the crash happened again yesterday, I hoped to have a core file tosend, but there is no sign of it having generated one :(
Testing on a test server, killing sleep generates a core file, but notkilling bird.
So two things - has anyone experienced similar crashes or have any ideaswhy we might be seeing this?
Can anyone advise how to reliably get a core dump if bird crashes?

Thanks!

Ian



--

Kind regards,
Barry O'Donovan
Consultant

For and on behalf of INEX

https://www.inex.ie/support/
+353 1 531 3339

Re: BIRD Crashes

Reply via email to