On Tue, May 29, 2012 at 10:00:53AM +0000, Matt Hamilton wrote: > Stuart Henderson <stu <at> spacehopper.org> writes: > > > cron job to restart it, with a random delay to avoid two machines > > coming back up at the same time when all the routers at a site > > fail together... > > So you just check it every minute to see if it is alive? > > It seems to me to be a pretty fundamental design flaw in the software given > its role. I would expect it to return sending a packet or something, not > just exit. > > > > The first message below seems to indicate unable to allocate > > > memory. I'm running these boxes pretty much stock having not tuned any > > > parameters at all. Both are just running routing daemons (bgpd, ospf) > > > and the 4.3 box is running OpenVPN. There are no applications running > > > and both boxes have plenty of RAM (4GB) and not using any swap or > > > anything. > > > > > > Is there something I should look at tuning in terms > > > of memory allocation in order to stop this happening? > > > > Make sure login.conf memory limits for the daemon class (or the > > _bgpd class on a newer OS version using /etc/rc.d) are high enough. > > If your limits are insufficient for the size of routing table then > > obviously you will have a problem. But also there is a bug > > somewhere, possibly to do with nexthop changes, which can result > > in very rapidly increasing memory use. > > Currently my routing table is pretty small. Only something like 150 > routes. This will increase once we start taking full feeds. At the moment > we only have a few partial feeds from networks we peer with and everything > else goes out a default route. > > I don't think it is a memory issue with the process itself, but the error > message seems to be more related to memory available to send the packet. > This is why I'm wondering if there is some sysctl or similar somewhere > I should be tweaking. > > -Matt
the 4.x error and the 5.1 error are unrelated. Your first task should be to upgrade the 4.x machine. -Otto