On 04/06/2018 02:07 AM, Dominique Martinet wrote: > (current kernel: vanilla 4.14.29) > > I've been running into troubles recently where a sockets "fills up" (as > in, select() will no longer return it in its outfd / attempting to write > to it after setting it to NONBLOCK will return -EWOULDBLOCK) and it > doesn't seem to ever "unfill" until the tcp connexion timeout. > > The previous time I pushed it down to the application for not trying to > read the socket either as I assume the buffers could be shared and > just waiting won't take data out, but this time I'm a bit more > skeptical as socat waits for the fd in both read and write... > > Let me take a minute to describe my setup, I don't think that how the > socket was created matters but it might be interesting: > - I have two computers behind NATs, no port forwarding on either side > - One (call it C for client) runs ssh with a proxycommand ncat/socat to > control the source port, e.g. > $ ssh -o ProxyCommand="socat stdio tcp:<server public > ip>:<port1>,sourceport=<port2>" server > - The server runs another socat to connect to that and forwards to ssh > locally, e.g. > $ socat tcp:<client public ip>:<port2>,sourceport=<port1> tcp:127.0.0.1:22 > > (yes, both are connect() calls, and that just works™ thanks to initial > syn replay and conntrack on routers) > > When things stall, the first socat is in a select with both fd in > reading, so it's waiting data. > The second socat has data coming from ssh and is in a select with the > client-facing socket in both read and write - that select never returns > so the socket is not available for reading or writing and does not free > up until the connection eventually times out a few minutes later. > > At this point, I only see tcp replays in tcpdump/wireshark. I've > compared dumps from both sides and there are no lost packets, only > reordering - there always is a batch of acks that were sent regularily > coming in shortly before the hang. Here's the trace on the server: > > 16:49:26.735042 IP <server local ip>.13317 > <client public ip>.31872: Flags > [.], seq 70476:71850, ack 4190, win 307, options [nop,nop,TS val 1313937641 > ecr 1617129473], length 1374 > 16:49:26.735046 IP <server local ip>.13317 > <client public ip>.31872: Flags > [.], seq 71850:73224, ack 4190, win 307, options [nop,nop,TS val 1313937641 > ecr 1617129473], length 1374 > 16:49:26.735334 IP <client public ip>.31872 > <server local ip>.13317: Flags > [.], ack 41622, win 918, options [nop,nop,TS val 1617129478 ecr 1313937609], > length 0 > 16:49:26.736005 IP <client public ip>.31872 > <server local ip>.13317: Flags > [.], ack 42996, win 940, options [nop,nop,TS val 1617129478 ecr 1313937609], > length 0 > 16:49:26.736402 IP <server local ip>.13317 > <client public ip>.31872: Flags > [.], seq 73224:74598, ack 4190, win 307, options [nop,nop,TS val 1313937643 > ecr 1617129473], length 1374 > 16:49:26.736408 IP <server local ip>.13317 > <client public ip>.31872: Flags > [.], seq 74598:75972, ack 4190, win 307, options [nop,nop,TS val 1313937643 > ecr 1617129473], length 1374 > 16:49:26.738561 IP <client public ip>.31872 > <server local ip>.13317: Flags > [.], ack 44370, win 963, options [nop,nop,TS val 1617129482 ecr 1313937616], > length 0 > 16:49:26.739539 IP <client public ip>.31872 > <server local ip>.13317: Flags > [.], ack 45744, win 986, options [nop,nop,TS val 1617129482 ecr 1313937616], > length 0 > 16:49:26.739882 IP <client public ip>.31872 > <server local ip>.13317: Flags > [.], ack 47118, win 1008, options [nop,nop,TS val 1617129484 ecr 1313937617], > length 0 > 16:49:26.740255 IP <client public ip>.31872 > <server local ip>.13317: Flags > [.], ack 48492, win 1031, options [nop,nop,TS val 1617129484 ecr 1313937617], > length 0 > 16:49:26.746756 IP <client public ip>.31872 > <server local ip>.13317: Flags > [.], ack 49866, win 1053, options [nop,nop,TS val 1617129493 ecr 1313937627], > length 0 > 16:49:26.747923 IP <client public ip>.31872 > <server local ip>.13317: Flags > [.], ack 51240, win 1076, options [nop,nop,TS val 1617129494 ecr 1313937627], > length 0 > 16:49:26.749083 IP <client public ip>.31872 > <server local ip>.13317: Flags > [.], ack 52614, win 1099, options [nop,nop,TS val 1617129495 ecr 1313937629], > length 0 > 16:49:26.750171 IP <client public ip>.31872 > <server local ip>.13317: Flags > [.], ack 53988, win 1121, options [nop,nop,TS val 1617129496 ecr 1313937629], > length 0 > 16:49:26.750808 IP <client public ip>.31872 > <server local ip>.13317: Flags > [.], ack 55362, win 1144, options [nop,nop,TS val 1617129497 ecr 1313937629], > length 0 > 16:49:26.754648 IP <client public ip>.31872 > <server local ip>.13317: Flags > [.], ack 56736, win 1167, options [nop,nop,TS val 1617129500 ecr 1313937629], > length 0 > 16:49:26.755985 IP <client public ip>.31872 > <server local ip>.13317: Flags > [.], ack 58110, win 1189, options [nop,nop,TS val 1617129501 ecr 1313937630], > length 0 > 16:49:26.758513 IP <client public ip>.31872 > <server local ip>.13317: Flags > [.], ack 59484, win 1212, options [nop,nop,TS val 1617129502 ecr 1313937630], > length 0 > 16:49:26.759096 IP <client public ip>.31872 > <server local ip>.13317: Flags > [.], ack 60858, win 1234, options [nop,nop,TS val 1617129503 ecr 1313937635], > length 0 > 16:49:26.759421 IP <client public ip>.31872 > <server local ip>.13317: Flags > [.], ack 62232, win 1257, options [nop,nop,TS val 1617129503 ecr 1313937635], > length 0 > 16:49:26.759755 IP <client public ip>.31872 > <server local ip>.13317: Flags > [.], ack 63606, win 1280, options [nop,nop,TS val 1617129504 ecr 1313937636], > length 0 > 16:49:26.760653 IP <client public ip>.31872 > <server local ip>.13317: Flags > [.], ack 64980, win 1302, options [nop,nop,TS val 1617129505 ecr 1313937636], > length 0 > 16:49:26.761453 IP <client public ip>.31872 > <server local ip>.13317: Flags > [.], ack 66354, win 1325, options [nop,nop,TS val 1617129506 ecr 1313937638], > length 0 > 16:49:26.762199 IP <client public ip>.31872 > <server local ip>.13317: Flags > [.], ack 67728, win 1348, options [nop,nop,TS val 1617129507 ecr 1313937638], > length 0 > 16:49:26.763547 IP <client public ip>.31872 > <server local ip>.13317: Flags > [P.], seq 4190:4226, ack 67728, win 1348, options [nop,nop,TS val 1617129507 > ecr 1313937638], length 36 > 16:49:26.763553 IP <client public ip>.31872 > <server local ip>.13317: Flags > [.], ack 70476, win 1393, options [nop,nop,TS val 1617129508 ecr 1313937639], > length 0 > 16:49:26.764298 IP <client public ip>.31872 > <server local ip>.13317: Flags > [.], ack 73224, win 1438, options [nop,nop,TS val 1617129509 ecr 1313937641], > length 0 > 16:49:26.764676 IP <client public ip>.31872 > <server local ip>.13317: Flags > [.], ack 75972, win 1444, options [nop,nop,TS val 1617129510 ecr 1313937643], > length 0 > 16:49:26.807754 IP <server local ip>.13317 > <client public ip>.31872: Flags > [.], seq 75972:77346, ack 4190, win 307, options [nop,nop,TS val 1313937714 > ecr 1617129473], length 1374 > 16:49:26.876467 IP <client public ip>.31872 > <server local ip>.13317: Flags > [.], ack 77346, win 1444, options [nop,nop,TS val 1617129620 ecr 1313937714], > length 0
There is no way a regular TCP stack (including linux) could send the following frame. > 16:49:27.048760 IP <server local ip>.13317 > <client public ip>.31872: Flags > [.], seq 32004:33378, ack 4190, win 307, options [nop,nop,TS val 1313937955 > ecr 1617129473], length 1374 So something is mangling the packet, maybe NAT or something. > 16:49:27.051791 IP <client public ip>.31872 > <server local ip>.13317: Flags > [P.], seq 4190:4226, ack 77346, win 1444, options [nop,nop,TS val 1617129762 > ecr 1313937714], length 36 > 16:49:27.076444 IP <client public ip>.31872 > <server local ip>.13317: Flags > [.], ack 77346, win 1444, options [nop,nop,TS val 1617129822 ecr > 1313937714,nop,nop,sack 1 {32004:33378}], length 0 > 16:49:27.371182 IP <client public ip>.31872 > <server local ip>.13317: Flags > [P.], seq 4190:4226, ack 77346, win 1444, options [nop,nop,TS val 1617130018 > ecr 1313937714], length 36 > 16:49:27.519862 IP <server local ip>.13317 > <client public ip>.31872: Flags > [.], seq 32004:33378, ack 4190, win 307, options [nop,nop,TS val 1313938426 > ecr 1617129473], length 1374 > 16:49:27.547662 IP <client public ip>.31872 > <server local ip>.13317: Flags > [.], ack 77346, win 1444, options [nop,nop,TS val 1617130293 ecr > 1313937714,nop,nop,sack 1 {32004:33378}], length 0 > 16:49:27.883372 IP <client public ip>.31872 > <server local ip>.13317: Flags > [P.], seq 4190:4226, ack 77346, win 1444, options [nop,nop,TS val 1617130530 > ecr 1313937714], length 36 > 16:49:28.511861 IP <server local ip>.13317 > <client public ip>.31872: Flags > [.], seq 32004:33378, ack 4190, win 307, options [nop,nop,TS val 1313939418 > ecr 1617129473], length 1374 > 16:49:28.538891 IP <client public ip>.31872 > <server local ip>.13317: Flags > [.], ack 77346, win 1444, options [nop,nop,TS val 1617131285 ecr > 1313937714,nop,nop,sack 1 {32004:33378}], length 0 > 16:49:28.907197 IP <client public ip>.31872 > <server local ip>.13317: Flags > [P.], seq 4190:4226, ack 77346, win 1444, options [nop,nop,TS val 1617131554 > ecr 1313937714], length 36 > 16:49:30.431864 IP <server local ip>.13317 > <client public ip>.31872: Flags > [.], seq 32004:33378, ack 4190, win 307, options [nop,nop,TS val 1313941338 > ecr 1617129473], length 1374 > 16:49:30.459127 IP <client public ip>.31872 > <server local ip>.13317: Flags > [.], ack 77346, win 1444, options [nop,nop,TS val 1617133204 ecr > 1313937714,nop,nop,sack 1 {32004:33378}], length 0 > 16:49:30.955388 IP <client public ip>.31872 > <server local ip>.13317: Flags > [P.], seq 4190:4226, ack 77346, win 1444, options [nop,nop,TS val 1617133602 > ecr 1313937714], length 36 > 16:49:34.207879 IP <server local ip>.13317 > <client public ip>.31872: Flags > [.], seq 32004:33378, ack 4190, win 307, options [nop,nop,TS val 1313945114 > ecr 1617129473], length 1374 > 16:49:34.235726 IP <client public ip>.31872 > <server local ip>.13317: Flags > [.], ack 77346, win 1444, options [nop,nop,TS val 1617136981 ecr > 1313937714,nop,nop,sack 1 {32004:33378}], length 0 > 16:49:35.256285 IP <client public ip>.31872 > <server local ip>.13317: Flags > [P.], seq 4190:4226, ack 77346, win 1444, options [nop,nop,TS val 1617137954 > ecr 1313937714], length 36 > 16:49:42.143864 IP <server local ip>.13317 > <client public ip>.31872: Flags > [.], seq 32004:33378, ack 4190, win 307, options [nop,nop,TS val 1313953050 > ecr 1617129473], length 1374 > 16:49:42.171531 IP <client public ip>.31872 > <server local ip>.13317: Flags > [.], ack 77346, win 1444, options [nop,nop,TS val 1617144917 ecr > 1313937714,nop,nop,sack 1 {32004:33378}], length 0 > 16:49:43.448262 IP <client public ip>.31872 > <server local ip>.13317: Flags > [P.], seq 4190:4226, ack 77346, win 1444, options [nop,nop,TS val 1617146146 > ecr 1313937714], length 36 > > (I still have the pcap files if someone wants to see them, but I'd > rather not give my work IP publicly so will send it privately on > request) > > > At this point, only the same 3 packets keep being replayed over and > over... According to 'ss' the Send-Q isn't empty on either side, the > client has some ~1k to send but the server has much more (87k) > After increasing the window size through net.*wmem sysctl it got stuck > with over 1MB in Send-Q, which makes sense because the socket is full... > > > So, what I don't get is, why are these acks continuously replayed? Given > the timing it looks like the server doesn't take the client acks into > account, despite having received that precise 33378 ack earlier and I > believe it should accept a higher ack value anyway ? > > The ultimate question being, how can I go about debugging that? > I'm working on getting perf probe/crash to work on the server so I can > look at the kernel side now, I can reproduce this semi-easily so I'm > sure I'll get down to it eventually, but if anyone has an idea I'm all > ears. > > > Thanks! >