Hi,
On 4/4/24 6:12 AM, Ilya Maximets wrote:
On 4/3/24 22:15, Brian Haley via discuss wrote:
Hi,
I recently have been seeing issues in a large environment where the
listen backlog of ovsdb-server, both NB and SB, is getting over-flowed,
for example:
17842 times the listen queue of a socket overflowed
17842 SYNs to LISTEN sockets dropped
Does this cause significant re-connection delays or is it just an
observation?
It is just an observation at this point.
There is more on NB than SB, but I was surprised to see any. I can only
guess at the moment it is happening when the leader changes and hundreds
of nodes try and reconnect.
This sounds a little strange. Do you have hundreds leader-only clients
for Northbound DB? In general, only write-heavy clients actually need
to be leader-only.
There are a lot of leader-only clients due to the way the neutron API
server runs - each worker thread has a connection, and they are scaled
depending on processor count, so typically there are at least 32. Then
multiply that by three since there is HA involved.
Actually I had a look in a recent report and there were 61 NB/62 SB
connections per system, so that would make ~185 for each server. I would
think in a typical deployment there might be closer to 100.
Looking at their sockets I can see the backlog is only set to 10:
$ ss -ltm | grep 664
LISTEN 0 10 0.0.0.0:6641 0.0.0.0:*
LISTEN 0 10 0.0.0.0:6642 0.0.0.0:*
Digging into the code, there is only two places where listen() is
called, one being inet_open_passive():
/* Listen. */
if (style == SOCK_STREAM && listen(fd, 10) < 0) {
error = sock_errno();
VLOG_ERR("%s: listen: %s", target, sock_strerror(error));
goto error;
}
There is no way to config around this to even test if increasing would
help in a running environment.
So my question is two-fold:
1) Should this be increased? 128, 256, 1024? I can send a patch.
2) Should this be configurable?
Has anyone else seen this?
I don't remember having any significant issues related to connection
timeouts as they usually getting resolved quickly. And if the server
doesn't accept the connection fast enough it means that the server is
busy and there may not be real benefit from having more connections
in the backlog. It may just hide the connection timeout warning while
service will not actually be available for the roughly the same amount
of time anyway. Having lower backlog may allow clients to re-connect
to a less loaded server faster.
Understood, increasing the backlog might just hide the warnings and not
fix the issue.
I'll explain what seems to be happening, at least from looking at the
logs I have. All the worker threads in question are happily connected to
the leader. When the leader changes there is a bit of a stampede while
they all try and re-connect to the new leader. But since they don't know
which of the three (again, HA) systems are the leader, they just pick
one of the other two. When they don't get the leader they disconnect and
try another.
It might be there is something we can do on the neutron side as well,
the 10 backlog just seemed like the first place to start.
Saying that, the original code clearly wasn't designed for a high
number of simultaneous connection attempts, so it makes sense to
increase the backlog to some higher value. I see Ihar re-posted his
patch doing that here:
https://patchwork.ozlabs.org/project/openvswitch/patch/20240403211818.10023-1-ihrac...@redhat.com/
I'll take a look at it.
Thanks, I plan on testing that as well.
One other thing that we could do is to accept more connections at a time.
Currently we accept one connection per event loop iteration. But we
need to be careful here as handling multiple initial monitor requests
for the database within a single iteration may be costly and may reduce
overall responsiveness of the server. Needs some research.
Having hundreds leader-only clients for Nb still sounds a little strange
to me though.
There might be a better way, or I might be mis-understanding as well. We
actually have some meetings next week and I can add this as a discussion
topic.
Thanks,
-Brian
_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss