> With current default optmem_max values, this allows about 150 keys on
64bit arches, and 88 keys on 32bit arches.
I think you just solved it for me at least... I was seeing issues after
~147 BGP sessions, so running into this limit would make perfect sense.
I could reproduce it at will just by restarting BIRD, so my issue at
least wasn't any sort of leak.
On 8/25/2015 4:45 PM, Michael Vallaly wrote:
For context on my end; this issue was experienced on physical hardware
(64bit) with Intel 1Gbit NICs (no offloading).
We only noticed this after some length of time, (> 180 days) during
which we likely had < 40 BGP session flaps on our end via Bird.
optmem_max: Maximum ancillary buffer size allowed per socket. Ancillary
data is a sequence of struct cmsghdr structures with appended data. The
default size is 10240 bytes.
According to Eric Dumazet back in 2012 [1]:
<snip>
There is no limit on number of MD5 keys an application can attach to a
tcp socket.
This patch adds a per tcp socket limit based
on /proc/sys/net/core/optmem_max
With current default optmem_max values, this allows about 150 keys on
64bit arches, and 88 keys on 32bit arches.
</snip>
Maybe we are getting multiple/duplicate MD5 keys assigned to the TCP
session somehow?
-Mike
[1] https://patchwork.ozlabs.org/patch/138861/
On Tue, 25 Aug 2015 15:48:44 -0400
Brian Rak <b...@gameservers.com> wrote:
I haven't tried the optmem_max option, but I did some more experimenting..
We have a virtual machine running a nearly identical BIRD config that's
not showing this issue.
The machine with the issue is physical, and has a Mellanox ConnectX
NIC. I'm wondering if there's some limitation with TCP offload there
that's responsible. Disabling TCP offload didn't seem to help though.
On 8/24/2015 4:59 PM, Michael Vallaly wrote:
I saw this problem back in 2013 on Bird 1.3.6 and 3.6+ kernels..
(Re: Strange MD5 Auth problem in BIRD 1.3.8)
AFAIK it was related to kernel socket option memory (or lack there of)
and I can only surmise it was related to some sort of memory leak.
Ondrej Zajicek seemed to think this was an issue in the kernel itself,
but I wasn't able to prove that definitively.
I was able to work around it (without rebooting) by:
<snip>
echo 40960 > /proc/sys/net/core/optmem_max # Defaults to 20480
</snip>
Which seemed to have deferred the issue, long enough for us to reboot /
not run into it constantly.
If anyone else has any details or info, I would still be interested in
the root-cause analysis and hopefully permanent fix.
-Mike
On Mon, 24 Aug 2015 15:59:06 -0400
Brian Rak <b...@gameservers.com> wrote:
I have a machine running BIRD 1.4.5, and I'm seeing a lot of these
messages when I start it up:
2015-08-24 15:54:26 <ERR> xxxx: Socket error: TCP_MD5SIG: Cannot
allocate memory
2015-08-24 15:54:26 <ERR> yyyy: Socket error: TCP_MD5SIG: Cannot
allocate memory
It also seems like the sessions that report that error do not come up,
and show a status of 'Error: Kernel MD5 auth failed'.
I'm only trying to configure around 200 BGP sessions here, most of which
are advertising a very small number of prefixes.
I don't really see any tunable settings here, any suggestions as to how
I can correct this?