On Wed, Nov 8, 2017 at 12:29 PM, Eric Dumazet <eric.duma...@gmail.com> wrote: > Please do not top post on netdev. Right - apologies for that. > > On Wed, 2017-11-08 at 11:04 -0500, Vitaly Davidovich wrote: >> So this issue is somehow related to setting SO_RCVBUF *after* >> connecting the socket (from the client). The system is configured >> such that the default rcvbuf size is 1MB, but the code was shrinking >> this down to 75Kb right after connect(). > > What are you calling default rcvbuf size exactly ? > > Is the application doing > > s = socket(...); > ... > setsockopt(s, SOL_SOCKET, SO_RCVBUF, [1000000], 4) > ... > connect(s, ...) > setsockopt(s, SOL_SOCKET, SO_RCVBUF, [75000], 4) > Yes, sort of. The application (Java, but nothing fancy here) does essentially the following: s = socket(...); // no explicit setting of SO_RCVBUF size, but the system default should be picked up (1MB as tcp_rmem shows) connect(s, ...); // now it goes and sets it setsockopt(s, SOL_SOCKET, SO_RCVBUF, 75000, ...); // then it goes to sleep for 15 mins sleep(...)
The client machine has /proc/sys/net/ipv4/tcp_rmem: 131072 1048576 20971520 > >> I think that explains why >> the window size advertised by the client was much larger than >> expected. I see that the kernel does not want to shrink the >> previously advertised window without advancement in the sequence >> space. So my guess is that the client runs out of buffer and starts >> dropping packets. Not sure how to further debug this from userspace >> (systemtap? bpf?) - any tips on that front would be appreciated. > > > You could provide a packet capture (tcpdump) for a start ;) I might be able to share that (this is from a private network). In the meantime, if there's something specific I should look at there, I'd be happy to do that and report back. I understand that's not ideal, but it would be faster/easier. My own observation is that the client's last ACK has a window size of >300KB, which I'm pretty sure it doesn't have room for if the rcvbuf was shrunk after the setsockopt() set it to 75000 (I understand the kernel actually reserves more than that, but even if it's double, that's still far less than room for 300KB. Needless to say, if I move the setsockopt(s, SOL_SOCKET, SO_RCVBUF, 75000, ...) prior to connect(s, ...), then everything works fine - we hit a "persist" state, and there's zero window alert and probing by the server. I've tried a few other buffer sizes, including smallish ones like 4KB and 8KB, and they all work (no real surprise there, but was more of sanity checking). The fact that SO_RCVBUF is set after connect() is a bug in the code - no doubt about it. However, I'm surprised it wedges the stack like this. Another interesting bit is that if the client isn't put to sleep but allowed to read the bytes as they come in, then everything works fine as well. So it's not like the stack is broken outright - I need to put the client to sleep to hit this (but it reproduces 100% of the time thus far). Thanks Eric > > >