On Tue, May 29, 2012 at 10:06:37AM +0000, Matt Hamilton wrote:

> Otto Moerbeek <otto <at> drijf.net> writes:
> 
> > 
> > On Tue, May 29, 2012 at 08:57:54AM +0000, Matt Hamilton wrote:
> > 
> > > Hi all,
> > > 
> > > More bgpd problems last night :( This happened last night on two of our
> > > routers. One running an old version of OpenBSD (4.3) and one running
> > > 5.1. Is there anyone out there actually using bpgd in production? How
> > > do you deal with it quitting everytime something unexpected happens on
> > > the network?
> > 
> > Yes, lots of people run it in production. 
> 
> That is what I'd expect. I just don't understand how with it keep dropping
> out when it has some transient problem.
> 
> > > 
> > > The first message below seems to indicate unable to allocate
> > > memory. I'm running these boxes pretty much stock having not tuned any
> > > parameters at all. Both are just running routing daemons (bgpd, ospf)
> > > and the 4.3 box is running OpenVPN. There are no applications running
> > > and both boxes have plenty of RAM (4GB) and not using any swap or
> > > anything.
> > > 
> > > Is there something I should look at tuning in terms
> > > of memory allocation in order to stop this happening?
> > > 
> > > OpenBSD 4.3/amd64:
> > > 
> > > May 29 05:53:43 firewall1 bgpd[5090]: imsg_create: buf_open: Cannot
> > > allocate memory
> > > May 29 05:53:43 firewall1 bgpd[5090]: fatal in RDE: imsg_compose
> > > error: Cannot allocate memory
> > > May 29 05:53:44 firewall1 bgpd[27053]: Lost child: route decision
> > > engine exited
> > > May 29 05:53:44 firewall1 bgpd[15204]: fatal in SE: pipe write error:
> > > Broken pipe
> > 
> > Only solution: upgrading. You are runing unsupported software, a
> > foolish thing to do.
> 
> Alas we don't all live in Utopia ;) This box is due to be upgraded soon, 
> but that upgrade is predicated on getting a stable routing environment
> so that I can do so. At the moment we are mid-way through migrating
> away from Cisco kit to OpenBSD routers. Until I can be confident that it
> won't all just fall over I can't continue with the migration.
> 
> So any insight on why I would be getting the same symptoms on the 5.1
> box? And was getting bgpd dying before under 5.0? I'm finding it hard

According to you previous message, you are getting a different
behaviour on the 5.1 box. A segfault is not the same as running out of mem.

As for the quitting problem: if a fatal error occurs, you don't have
any other choice than to quit. A fatal error means the process cannnot
be trusted any more. This is unsatisfactory, but the only way. 


> to believe that this behaviour would have been tolerated by people 
> running bgpd in production all the way from the time of 4.3 to now.
> Which leads to the only conclusion... I'm doing something stupid.
> The question is what. I have ospfd and bgpd running. On the 5.1 box
> there is also a CARP interface too (not an interface we are using ospfd on).
> 
> -Matt

There have been earlier reports of bgpd running out of mem or getting
segfaults. In some cases that lead to fixing bugs. There might remain
unsolved cases. 

Working with the developers is one way of getting problems resolved.
Ranting about "I cannot believe this is happening" is not a
constructive way to get closer to the solution. 

        -Otto

Reply via email to