On 01/22/2018 10:16 AM, Eric Dumazet wrote:
On Mon, 2018-01-22 at 09:28 -0800, Ben Greear wrote:
My test case is to have 6 processes each create 5000 TCP IPv4 connections to
each other
on a system with 16GB RAM and send slow-speed data. This works fine on a 4.7
kernel, but
will not work at all on a 4.13. The 4.13 first complains about running out of
tcp memory,
but even after forcing those values higher, the max connections we can get is
around 15k.
Both kernels have my out-of-tree patches applied, so it is possible it is my
fault
at this point.
Any suggestions as to what this might be caused by, or if it is fixed in more
recent kernels?
I will start bisecting in the meantime...
Hi Ben
Unfortunately I have no idea.
Are you using loopback flows, or have I misunderstood you ?
How loopback connections can be slow-speed ?
Hello Eric, looks like it is one of your commits that causes the issue
I see.
Here are some more details on my specific test case I used to bisect:
I have two ixgbe ports looped back, configured on same subnet, but with
different IPs.
Routing table rules, SO_BINDTODEVICE, binding to specific IPs on both client
and server
side let me send-to-self over the external looped cable.
I have 2 mac-vlans on each physical interface.
I created 5 server-side connections on one physical port, and two more on one
of the mac-vlans.
On the client-side, I create a process that spawns 5000 connections to the
corresponding server side.
End result is 25,000 connections on one pair of real interfaces, and 10,000
connections on the
mac-vlan ports.
In the passing case, I get very close to all 5000 connections on all endpoints
quickly.
In the failing case, I get a max of around 16k connections on the two physical
ports. The two mac-vlans have 10k connections
across them working reliably. It seems to be an issue with 'connect' failing.
connect(2074, {sa_family=AF_INET, sin_port=htons(33012),
sin_addr=inet_addr("10.1.1.5")}, 16) = -1 EINPROGRESS (Operation now in
progress)
socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 2075
fcntl(2075, F_GETFD) = 0
fcntl(2075, F_SETFD, FD_CLOEXEC) = 0
setsockopt(2075, SOL_SOCKET, SO_BINDTODEVICE, "eth4\0\0\0\0\0\0\0\0\0\0\0\0",
16) = 0
setsockopt(2075, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
bind(2075, {sa_family=AF_INET, sin_port=htons(0),
sin_addr=inet_addr("10.1.1.4")}, 16) = 0
getsockopt(2075, SOL_SOCKET, SO_RCVBUF, [87380], [4]) = 0
getsockopt(2075, SOL_SOCKET, SO_SNDBUF, [16384], [4]) = 0
setsockopt(2075, SOL_TCP, TCP_NODELAY, [0], 4) = 0
fcntl(2075, F_GETFL) = 0x2 (flags O_RDWR)
fcntl(2075, F_SETFL, O_ACCMODE|O_NONBLOCK) = 0
connect(2075, {sa_family=AF_INET, sin_port=htons(33012),
sin_addr=inet_addr("10.1.1.5")}, 16) = -1 EINPROGRESS (Operation now in
progress)
socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 2076
fcntl(2076, F_GETFD) = 0
fcntl(2076, F_SETFD, FD_CLOEXEC) = 0
setsockopt(2076, SOL_SOCKET, SO_BINDTODEVICE, "eth4\0\0\0\0\0\0\0\0\0\0\0\0",
16) = 0
setsockopt(2076, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
bind(2076, {sa_family=AF_INET, sin_port=htons(0),
sin_addr=inet_addr("10.1.1.4")}, 16) = 0
getsockopt(2076, SOL_SOCKET, SO_RCVBUF, [87380], [4]) = 0
getsockopt(2076, SOL_SOCKET, SO_SNDBUF, [16384], [4]) = 0
setsockopt(2076, SOL_TCP, TCP_NODELAY, [0], 4) = 0
fcntl(2076, F_GETFL) = 0x2 (flags O_RDWR)
fcntl(2076, F_SETFL, O_ACCMODE|O_NONBLOCK) = 0
connect(2076, {sa_family=AF_INET, sin_port=htons(33012),
sin_addr=inet_addr("10.1.1.5")}, 16) = -1 EADDRNOTAVAIL (Cannot assign
requested address)
....
ea8add2b190395408b22a9127bed2c0912aecbc8 is the first bad commit
commit ea8add2b190395408b22a9127bed2c0912aecbc8
Author: Eric Dumazet <eduma...@google.com>
Date: Thu Feb 11 16:28:50 2016 -0800
tcp/dccp: better use of ephemeral ports in bind()
Implement strategy used in __inet_hash_connect() in opposite way :
Try to find a candidate using odd ports, then fallback to even ports.
We no longer disable BH for whole traversal, but one bucket at a time.
We also use cond_resched() to yield cpu to other tasks if needed.
I removed one indentation level and tried to mirror the loop we have
in __inet_hash_connect() and variable names to ease code maintenance.
Signed-off-by: Eric Dumazet <eduma...@google.com>
Signed-off-by: David S. Miller <da...@davemloft.net>
:040000 040000 3af4595c6eb6d331e1cba78a142d44e00f710d81
e0c014ae8b7e2867256eff60f6210821d36eacef M net
I will be happy to test patches or try to get any other results that might help
diagnose
this problem better.