On Fri, 2017-11-03 at 13:23 -0400, Vitaly Davidovich wrote: > On Fri, Nov 3, 2017 at 12:05 PM, Eric Dumazet <eric.duma...@gmail.com> wrote: > > On Fri, 2017-11-03 at 11:13 -0400, Vitaly Davidovich wrote: > >> Ok, an interesting finding. The client was originally running with > >> SO_RCVBUF of 75K (apparently someone decided to set that for some > >> unknown reason). I tried the test with a 1MB recv buffer and > >> everything works perfectly! The client responds with 0 window alerts, > >> the server just hits the persist condition and sends keep-alive > >> probes; the client continues answering with a 0 window up until it > >> wakes up and starts processing data in its receive buffer. At that > >> point, the window opens up and the server sends more data. Basically, > >> things look as one would expect in this situation :). > >> > >> /proc/sys/net/ipv4/tcp_rmem is 131072 1048576 20971520. The > >> conversation flows normally, as described above, when I change the > >> client's recv buf size to 1048576. I also tried 131072, but that > >> doesn't work - same retrans/no ACKs situation. > >> > >> I think this eliminates (right?) any middleware from the equation. > >> Instead, perhaps it's some bad interaction between a low recv buf size > >> and either some other TCP setting or TSO mechanics (LRO specifically). > >> Still investigating further. > > > > Just in case, have you tried a more recent linux kernel ? > I haven't but will look into that. I was mostly hoping to see if > anyone perhaps has seen similar symptoms/behavior and figured out what > the root cause is - just a stab in the dark with the well-informed > folks on this list :). As of right now, based on the fact that a 1MB > recv buffer works, I would surmise the issue is perhaps some poor > interaction between a lower recv buffer size and some other tcp > settings. But I'm just speculating - will continue investigating, and > I'll update this thread if I get to the bottom of it. > > > > I would rather not spend time on some problem that might already be > > fixed. > Completely understandable - I really appreciate the tips and pointers > thus far Eric, they've been helpful in their own right.
I am interested to see if the issue with small sk_rcvbuf is still there. We have an upcoming change to rcvbuf autotuning to not blindly give tcp_rmem[2] to all sockets, but use a function based on RTT. Meaning that local flows could use small sk_rcvbuf instead of inflated ones. And meaning that we could increase tcp_rmem[2] to better match modern capabilities (more memory on hosts, larger BDP)